[Buildbot-devel] database-backed status/scheduler-state project

Tue Sep 15 20:45:09 UTC 2009

I'm glad John gave a fair amount of info to figure out what are the
exact implementation's goals of Brian's work. I had difficulty to
grasp all the implications so I'd like to be sure I correctly grasped
them. Please correct me if I'm wrong. Then I'll be able to give my
opinion on this.

GOALS
1) Reduce downtime with slow controlled shutdown situations; N hours
'scheduler trigger' to 'all builders idle' latency issue.
2) Make buildbot more scalable under high status serving load.
3) Make buildbot more scalable under high slave load; which is orthogonal to 2).
4) Synchronize Schedulers across multiple masters when an
infrastructure can't cope with 3) and multiple masters are used.
5) Automatically requeue build jobs on builders that have lost all
their slaves (e.g. data center lost)
6) General better slave utilization and load spreading (mostly
orthogonal to the rest). It think this is unrelated to the rest and
just happens to be a general comment from John.

IMPLEMENTATIONS
The actual implementation I see in your proposal are:
#1 keeping transient build state across restart and across multiple
masters, persisting it into a DB for goal 1), 3), 4), 5) and
potentially 6). The DB has 2 functionalities here: persistence and IPC
mechanism.
#2 keeping the slave 'alive' across master restart to not have to
reschedule all the pending queues for goal 1)
#3 moving all transient and some persistent status state (except
stdio) into a DB for goal 2). The DB has 2 functionalities here:
persistence and IPC mechanism.
#4 reduce buildbot server load by enabling status serving out of
process for goal 2)

For sake of discussion, could all the goals and implementations be
explicitly filed as ticket? I don't see how you can put a ticket
dependent on another one on Trac, is that possible?

M-A