[Buildbot-devel] database-backed status/scheduler-state project

Wed Sep 2 22:56:05 UTC 2009

Hi everybody, it's me again.

I've taken on a short-term contract with Mozilla to make some
scaling/usability improvements on Buildbot that will be suitable for
merging upstream. The basic pieces are:

 * persistent (database-backed) scheduling state
 * DB-backed status information
 * ability to split buildmaster into multiple load-balanced pieces

I'll be working on this over the next few months, pushing features into
trunk as we get them working (via my github repo). The result should be
a buildbot which:

 * lets you bounce the buildmaster without losing queued builds or the
   state of i.e. Dependent schedulers
 * bouncing a master or slave during a build should re-queue the
   interrupted build
 * third-party tools can read or manipulate the scheduler state, to
   insert builds, cancel requests, or accelerate requests, all by
   fussing with the database
 * third-party tools can render status information (think PHP scripts
   reading stuff out of the DB and generating a specialized waterfall)
 * multiple "build-process-master" processes (needs a better name) can
   be run on separate CPUs, each handling some set of slaves. Each one
   claims a buildrequest from the DB when it has a slave available, runs
   the build, then marks the build as done. If one dies, others will
   take over.

I'm hoping that the persistent scheduler-state code will be done by the
end of the month, ready to put into a buildbot-0.8.0 release shortly
thereafter.

DATABASES:

I'm planning to make the default config store the scheduler state in a
SQLite file inside the buildmaster's base directory. To enable the
scaling benefits, you'd need a real networked database, so I also plan
to have connectors for MySQL and potentially others.

The plan is to have the schedulers make synchronous DB calls, rather
than completely rewriting the scheduler/builder code to look more like a
state machine with async calls (twisted.enterprise). This should let us
finish the project sooner and with fewer instabilities, but also means
that DB performance is an issue, since a slow DB will block everything
else the buildmaster is doing. The Mozilla folks are ok with this, so
we'll just build it and see how it goes.

It's very important to me that Buildbot is easy to get installed for all
users, and installing a big database is not easy, so the default will be
the no-effort-required entirely-local SQLite. Users will only have to
set up a real database if they want the "distributed across multiple
computers" scaling features.

The statusdb (as opposed to the schedulerdb) may be implemented as a
buildbot status plugin, leaving the existing pickle files alone, but
exporting a copy of everything to an external database as the builds
progress. This would reduce the work to be done (there's already some
code to do much of this) and minimize the impact on the core code (we'd
just be adding an extra file that could be enabled or not as people saw
fit), but might not result in something that's as well integrated into
the buildbot as it could be (and it might be nice to have a
Waterfall/etc which read from the database, as things like
filter-by-branchname would finally become efficient enough to use).

DEPENDENCIES:

Buildbot-0.8.0 will need sqlite bindings. These come batteries-included
(in the standard library) with python 2.5 and 2.6. Users running
python2.4 will have to install the python-pysqlite2 package to run
buildbot-0.8.0. I think this is a pretty minimal addition.

I'm examining SQLAlchemy to see if the features it offers would be worth
the extra dependency load. I don't want to use a heavy ORM (because a
big goal is to have a schema that's easy to query/manipulate from other
languages), but it looks like it's got connection-pool-management and
cross-DB support code that might be useful.

What do people think about the 0.8.0 buildmaster potentially requiring
sqlalchemy? Would that annoy you? Annoy new users? Make it hard to
upgrade your environment?

HELP!:

I'm looking to hear about other folk's experiences with this sort of
project. We've been talking about this for years, and some prototypes
have been built, so I'd like to hear about them (I've been briefed on
many of the mozilla efforts already).

I'll attach the proposal below, along with a file of notes that I made
while walking through the code to see how this needs to work.

cheers,
 -Brian

===== PROJECT PROPOSAL =====

Buildbot Database project:

The goal is to improve the usability and scalability of Buildbot to meet
Mozilla's current needs, implemented in an appropriate fashion to get
merged upstream. The primary "pain points" to be addressed are:

 * most buildmaster state is held in RAM, preventing process restarts
   for fear of losing queued builds and builds-in-progress. There is no
   "graceful shutdown" command, but even if there were, it could take
   hours or days to wait for everything in the queue to finish, losing
   valuable developer time.

 * buildmaster does many things in one process (build scheduling, build
   processing, status distribution), and CPU exhaustion has been
   observed

 * Waterfall display is very CPU-intensive. Current deployment does not
   share waterfall with outside world for fear of overload. Development
   of alternate status displays (which could run in separate processes)
   is hampered by the local-file pickle-based status storage format.

The changes planned for this project are:

 * move build scheduling state out of RAM and into a persistent
   database, allowing buildmaster to be bounced without losing queued
   builds. Builders will claim builds from the database, perform the
   builds, then update the DB to mark the build state as done, allowing
   multiple buildmaster processes (on separate machines) to share the
   load, communicating mostly through the DB. New tools (written in
   arbitrary languages) can be used to manipulate the schedulerdb, to
   implement features like "accelerate build request", "cancel request",
   etc.

 * move build status out of pickle files into a database, to enable
   multiple processes (on separate hosts) to access the status. Database
   replication can then be used to allow a publically-visible Waterfall
   without threatening to overload the buildmaster. Status displaying
   tools (dashboards, etc) can be written in arbitrary languages and
   simply read the information they need from the statusdb.

 * add configuration options to switch on/off the four main buildmaster
   functions (ChangeMaster, Schedulers, Builder/Build processing, Status
   distribution), allowing these functions to be spread across multiple
   processes, using the state/status databases for coordination. The
   goal is to have one ChangeMaster/Schedulers process, multiple
   Builder/Build processing tasks (one "build-master" per "pod", with a
   set of slaves attached to each one), and multiple status distribution
   processes. This should help the scalability problem, by allowing the
   load to be spread across multiple computers.

 * the default database will be a local SQLite file, but master.cfg
   statements will allow flexible configuration of the database
   connection method. Postgres (or whatever mozilla's favorite DB is)
   will be tested. Others (at least MySQL) should be possible.
   Provisions will be made to tolerate the inevitable SQL dialect
   variations.

 * (probably) add "graceful shutdown" switch to the buildmaster. Once
   the buildmaster is in this mode, new jobs will not be started, and
   the buildmaster will shutdown once the last running job completes.
   The switch may have an option to make the buildmaster restart itself
   automatically upon shutdown. UI is uncertain.

 * (maybe) add "graceful shutdown" switch to the buildslave, used in the
   same way as the buildmaster's switch. UI is uncertain.

 * (probably) add "RESUBMIT" state to the overall Build object (along
   with the existing SUCCESS, WARNING, FAILURE, EXCEPTION states). The
   scheduling code will react to this by requeueing the BuildRequest.
   Builds which stop because of a lost slave or restarted buildmaster
   will be marked with this state, so they will be re-run when the
   necessary resources come back.

 * retain cancel-build capabilities (may require Builder to poll a DB to
   see if the build has been cancelled)

Design restrictions imposed by Brian as Buildbot upstream developer:

 * dependency load must not increase significantly. I'm ok with
   requiring SQLite because it's built-in to python2.5/2.6, and easy to
   get for python2.4 . I'm not willing to require other database
   bindings, nor to require all Buildbot users to install/configure an
   e.g. MySQL database before they can run a buildmaster.

 * existing 0.7.11 deployments must remain compatible with the new code.
   The default configuration must use SQLite in a local directory. Any
   state-migration steps that must be done will be handled by adding new
   code to the existing "buildbot upgrade-master" command.

 * all code must have clear User's Manual documentation (with examples)
   and adequate unit tests. All changes must be licensed compatibly with
   the upstream source (GPL2).

The specific milestones we're planning are:

 * phase 1: Create the database connectors (initially only SQLite), move
   just the scheduler state into the database. This includes the output
   of the ChangeMaster, the internal state of all Schedulers, and the
   list of ready-to-go BuildRequests. All existing Scheduler classes and
   the Builder class will be changed to scan the database for work
   instead of looking at lists in RAM. The RESUBMIT state will be
   implemented and Builders updated to requeue such builds.

   This will allow the buildmaster to be bounced without loss of state
   (although any running builds will be abandoned and requeued). It will
   not yet enable the use of multiple processes. It will not touch the
   build status information (currently stored in pickle files).

  * phase 1.1: Implement the Postgres database connector, and the
    master.cfg options necessary to control which db type/location to
    use for scheduler state. Test a buildmaster running with a remote
    schedulerdb.

  * phase 1.2: Implement graceful-shutdown controls.

 * phase 2: Change the build-status code to store its state in a
   database, instead of in the current pickle files. Implement a "Log
   Server" to store/publish/stream logfile contents. Write a "buildbot
   upgrade-master" tool to non-destructively migrate old pickle data
   into the new database and logserver. Change the existing Status
   plugins (Waterfall, MailNotifier, IRCBot, etc) to read status from
   database. Add master.cfg options to control which db is used for
   status data.

   This will enable non-buildbot status-displaying frontends.

 * phase 3: Add master.cfg options to control which components are
   enabled in any given process. Provide mechanisms and examples to run
   e.g. multiple build-process-masters which coordinate through the
   database. Implement TCP/HTTP/polling -based "ping notifiers" to allow
   low-latency triggering between components in separate processes (i.e.
   Scheduler writes ready-to-build requests into DB, but the
   build-process-master on a separate host must be told to re-scan the
   DB for new work). Provide master.cfg options to control type/location
   of DB, ping-notifiers, and Log Server. build-process-master instances
   will have some configuration in common, other configuration unique to
   each instance.

   This will finally enable scaling through multiple buildbot processes,
   and multiple Waterfall renderers.

I'm roughly targetting phase 1 to be incorporated into an upstream
buildbot-0.8.0 release, and phase 2 in an 0.9.0 release shortly
afterwards. Phase 3 may get into 0.9.0, or may go into a subsequent
upstream release.

Aggressive target is to get phase 1 done by end of september, then
evaluate schedule and progress made before beginning next phase. Overall
goal is to complete project in 2-3 months.

Sub-tasks which can be split out easily include:

 * database connector module: python "dbapi2" interface,
   reconnection-on-error (and log attempts w/backoff), cross-database
   compatibility code, blocking methods for scheduler state db,
   fire-and-forget (but retry for a little while) for status writes

 * "ping notifier" module: define HTTP POST / line-oriented TCP /
   polling protocol, implement client / server modules.

 * Log Server: writer-side PB interface, reader-side HTTP interface

=== DESIGN NOTES ===
  -*- org-mode -*-

* databases: three databases, plus logserver
** Changes go in one database
** scheduling stuff (Scheduler state, builds ready-to-go/claimed/finished)
   this includes BuildRequests and their properties
** status (steps, logids, results, properties)
   the goal is for the buildmaster to never read from the status db, only
   the status-rendering code (which will eventually live elsewhere)

* database connector
** all statusdb calls may raise DBUnavailableError
   renderer should deliver error to client
** all schedulerdb calls should block, reconnect, retry */1s, log w/backoff
   db is critical to this part
** config option to set DB type, connection arguments
** schema restrictions to get cross-db compatibility:
   - declare types (SQLite tolerates, but most don't)
   - revision ids will be strings, SVN will deal
   - no binary strings. Unicode is ok(?).

* notification mechanism
  - first milestone (non-distributed) will be all in-process
  - distributed milestone will require pings
    HTTP POST (forwards), TCP line-oriented (either), or just polling

* persistent scheduler project
** Changemaster:
   - (changeid, branchname, revisionid, author, timestamp, comment,
category?)
     changeids must be comparable and monotonically increasing
   - (changeid, filename)
     i.e. changes[changeid].filenames = []
   - (changeid, propertyname, propertyvaluestring)
     i.e. changes[changeid].properties = {name: value}
*** add row to database, ping Schedulers (eventual-send)
*** ping all schedulers at buildmaster startup
** Schedulers:
   - all state must be put in DB
     - each records last-change-number, only examines changes since then
     - each records list of changes, with important/unimportant flag
   - trickiest part will be relationships between Dependent schedulers
*** when pinged, or timer wakeup:
    - loop over all Schedulers
    - scan for unchecked changeids
      - default Scheduler ignores changes on the wrong branch
      - check importance of each
      - add to changes table
      - arrange for tree-stable-timer wakeup
    - if all changes are old enough, and important, then submit build
      - AnyBranchScheduler processes changes one branch at a time
*** Dependent (downstream):
    - configured with an upstream scheduler, by name
    - wants to be told when upstream BuildSet completes successfully,
      receive SourceStamp object
    - then submits a new BuildSet, using the same SourceStamp, with
different
      buildernames and properties
**** so, this scheduler ignores the changes table and watches active-builds
     - defer figuring it out until I build the active-build table
*** Periodic
    - (schedulername, last-build-started-time, last-changeid-built)
    - if last-build-started-time + delay < now:
      make SS with recent changes, submit buildset, update
      last-build-started-time and last-changeid-built
      - consider checking active-builds, avoid overlaps
    - else: arrange for wakeup in (now - last-build-started-time + epsilon)
**** after a long downtime, this should start a build
*** Nightly/Cron
    - like Periodic, but compute next build time differently
**** after a long downtime, this should *not* start a build
     - maybe make that configurable, catchup=bool
*** Try: ignores changetable, just submits buildsets
*** schema:
    - changes table: (schedulerid, changenum, important_p)
    - timer table: (wakeup-time)
      if min(wakeup-time) < now: empty table, ping all schedulers
**** default Scheduler
     - (schedulerid, schedulername, last-changeid-checked)
**** Periodic
    - (schedulername, last-build-started-time, last-changeid-built)
**** Triggerable
     - really just maps scheduler name +properties to buildernames
     - certain buildsteps can push the trigger, wait for completion
     - ignores changetable, ignores buildtable
     - does not use schedulerdb
**** SourceStamps
     how to gc?
     - (sourcestampid, branch, revision/None, patchlevel, patch)
     - (sourcestampid, changeid)
*** scheduler has properties, copied into BuildSet
    - doesn't need to be in the scheduler table, but might need to be in
      BuildSet table
*** scheduler's output is a BuildSet, which has .waitUntilFinished()
    - buildernames, sourcestamp, properties
** BuildSet
   - have .waitUntilFinished(), used by downstream Dependent schedulers and
     Triggerable steps
   - (buildsetid, sourcestampid, reason, idstring, current-state)
     idstring comes from Try job, to associate with external tools
     - current-state in (hopeful, unhopeful, complete)
       (no failures seen yet, some failures seen, all builds finished)
       (idea is to notify early on first sign of failure)
   - (buildsetid, buildername, buildreqid)
     i.e. buildset.buildernames = []
   - (buildsetid, propertyname, valuestring)
     i.e. buildset.properties = {}
*** when all buildrequests complete, aggregate the results
    - when each buildrequest completes, ping the buildsets
      - this may change the buildset state
      - buildset state changes should ping schedulers
** BuildRequest
   - created with reason, sourcestamp, buildername, properties
   - can be merged with other requests, if sourcestamps agree to it
   - given to Builder to add to the builder queue
   - can be started multiple times: updates status, informs watchers
   - can be finished once, informs watchers
   - IBuildRequestControl: subscribe/un, cancel, .submit_time
     not sure if anybody calls it.. words.py? a few tests?
   - "reqtable": (buildrequestid, reason, sourcestampid, buildername,
                  claimed-at, claimed-by?)
   - (buildrequestid, propertyname, propertyvalue)
** Builder
   - .buildable, .building
   - submitBuildRequest adds to .buildable, pings maybeStartAllBuilds
   - what is __getstate__/__setstate__ doing there?
*** so we need the Builder to scan the reqtable
    - this is the part that will get distributed
    - Builder A can claim any buildreqest that's for it and not yet claimed
      or was claimed but got orphaned by a dead buildmaster, maybe have
      a timestamp or two
    - "claimed-at" holds timestamp, starts at 0, updated when a buildmaster
      grabs it, refreshed every once in a while. req can be claimed by
      someone else when (now - claimed-at) > timeout.
    - when the build is done, the buildrequest is removed from the reqtable
      and the buildset is examined
    - to cancel a request: remove it from the table
    - add submit-time or submit-sequence, to provide first-come-first-built
      to accelerate a request, change that value

* LogServer
** writer-side PB interface:
   - open(title) -> logid string
   - write(logid, channel, data)
   - close(logid)
     logfile is renamed (from LOGID.open to LOGID.closed) upon close
   - get_base_url()
** buildmaster sends async writes, queues limited amount of requests
   - fire-and-forget-after-30s, discard if queue grows too big
   - goal is to tolerate LogServer bounces but not consume lots of memory
** reader-side HTTP interface:
*** logid URL shows title, filesize, options links, open/closed status
    - with/without headers
    - just stderr
    - last N lines (when closed), last N lines plus headers
    - reads when open do tail-f
*** all option links are normal statically-computable URLs

* DB-based status writer
** write logserver baseurl into DB each time LogServer PB connection is made
** indirect this, to plan for multiple LogServers (logserverid=1 for now)
   - (stepid, logserverid, logid)
   - (logserverid, logserver_baseurl)

* DB-based status renderer
**

* random ideas to keep in mind
** scheduler db is small
   - so rather than coming up with clever queries, just grab everything,
     sort it in memory
   - also useful to avoid doing multiple queries