[Buildbot-devel] database-backed status/scheduler-state project

Tue Sep 8 18:20:50 UTC 2009

exarkun at twistedmatrix.com wrote:

> What's bothersome about using the database as the third-party API is
> that it is very difficult to decide later that you want to change the
> implementation of this API.

Good points. I'd say that the difficulty of changing this API is in
direct relation with how much work/fear you want to impose upon the
people who use it.

> At least, not unless you're willing to break whatever third-party code
> is relying on the API. I would definitely want to avoid this. The idea
> of actively seeking it out is almost painful. :)

We could take this road, and put a warning sticker on the db schema that
says "subject to change, you take responsibility for updating anything
that you write that touches this, we will not bend over backwards to
retain compatibility with your apps". Or we could take the other one,
with a granite plaque that says "we promise to never ever change this,
use it with confidence". Or something in between.

I guess I'm (currently) comfortable with the warning sticker approach,
under the theory that the minority of users who need to build these
sorts of tools (and are willing to get their hands dirty) will also be
willing+able to pay attention to the schema changes from one release to
the next. Most users won't be aware of the db.

Taking this road will cause a certain amount of fear: unwillingness to
upgrade to newer buildbot versions because some local customization
might break. There is a similar issue with the internals of BuildSteps:
several sites are stuck on buildbot-0.7.6 or older (several years out of
date) because they're unwilling/unable to take the time to update their
custom steps to the current code, or at least they're afraid of what
will happen.

I don't think you can ever avoid this completely: it's a spectrum. I
think that a reasonable goal is to make the upgrade effort/fear
proportional to the amount of customization you've done. Somebody who
only uses the stock BuildSteps in their master.cfg should have had no
problems upgrading their buildbot at any time. Folks who have written
custom buildsteps have experienced some small changes (the way that
arguments are stashed comes to mind), but not major ones (yet).

Also, we're probably going to be iterating over the schema for a couple
of releases, as we gain experience and figure out what will work best.
So we won't be in a position to make any external compatibility promises
for a while anyways (it'll be challenging enough to provide internal
compatibility, such that shutting down an 0.8.0 buildbot and starting up
an 0.9.0 one in its place doesn't lose a queued buildrequest). Any
"early adopters" who decide to get dirty with the schema-as-API will
need to be on their toes anyways.

> With XML-RPC or PB or whatever other RPC mechanism along those lines,
> you can always substitute the existing implementation for a new one
> without changing the externally visible behavior,

Yeah, I hear you. I think PB is a non-starter, given the lack of
non-python bindings. Maybe if AMP had been around at the time :). XMLRPC
is annoying (requires WebStatus which isn't always enabled, has no
security story, takes a bazillion roundtrips to get enough information).

So, for persistence, I think storing scheduler/pending-build state in a
database makes sense (better than pickles, right?). To enable the
spread-across-multiple-machines feature, I think a network-reachable
database makes more sense than a local-only SQLite DB with a PB frontend
and having the swarm of buildmasters all speak PB to the "real" master
(can you imagine all the code that'd have to be written to expose all
that state?). And, once you have that, allowing 3rd-party tools to
manipulate that state (or at least making it easier on them) is just a
short leap beyond.

> The only downside from the Mozilla perspective of an API instead of
> direct database access is that they'll have to think about what data
> they want to access /before/ you finish working on this instead of
> after, or they'll have to hire you again or find someone else to
> implement any new RPC methods for data they didn't anticipate needing.
> This isn't really a bad thing, as long as Buildbot development is
> active, since adding any particular new RPC method isn't much of a
> challenge, and the only potentially painful thing is waiting for it to
> be part of a release. Aside from that, it's really a good thing, since
> it's a lot better to know what you're writing software to do before
> you write it instead of after.

Great points. Part of the goal here is to enable other people to write
tools. Adding new control interfaces into Buildbot proper is a fairly
lengthy process, which gives us (as buildbot developers) more control
and QA time, but also stifles development of those external tools. And
the RPC techniques available to us (i.e. XMLRPC) also slow things down,
because they're not as easy to use from the client side (at least for
PHP authors who are familiar with databases but not with correct/safe
usage of asynchronous calls).

>> Also, please tell me more about the "other options" to SQLAlchemy..
> 
> Hm. Those aren't the features I was considering at all. I'm not
> familiar with any RDBMS that provides the necessary features for a
> reasonable reconnection story.

Huh.. does that mean that most of those DB-using applications must
always be started after the DB is up, and must get restarted if they
ever lose the TCP connection? Hm, or maybe most of these DB-using
applications are one-shot scripts that launch, connect, do their
business, and then terminate? Not daemons, in other words.

That's a pity. Seems like reconnection should be handled in the same
place as connection pools.

> I was going to suggest that you take a look at Storm.

Cool, I'll take a look.

thanks for all your help!
 -Brian