[Buildbot-devel] buildbot database-based data storage get-together?)

Wed Aug 6 15:34:25 UTC 2008

Marcus Lindblom wrote:
> Jean-Paul Calderone wrote:
> > I'm not sure what a "disconnected design" is - I suppose it's something
> > like loose coupling - but I don't think it will help avoid the pain of
> > using an ORM in a Twisted applications.

> It's simply that the components don't depend on eachother.

> I.e. you could use any subset of Django's ORM, HTTP or template system,
> and/or replace any one of those with one of your own design in your app.

There are a lot of Python ORMs.  I don't have any personal experience with
Django's ORM, but if it is good and can be used without using the rest of
Django, then it sounds like it's just as much of a candidate as any of the
others.  It probably won't make a huge difference if Django's ORM is used or if
Storm is used or if one of the others is used.  They all provide basically the
same functionality.

Axel Hecht wrote:
> Regarding just hand-coding SQL, I know jack, but I saw that sqlite has
> rather rudimentary support for things like dates, and that text is not
> length limited. Both seem rather strong constraints when trying to
> write portable code.

These points are basically true (although there is a limit to text length, it's
just bigger than you'd ever encounter).  I doubt either of these will cause
much trouble when trying to write portable SQL.  You'll find similar
compatibility issues between other RDBMS concerning such things as
autoincrement fields, last insertion id APIs, and differing builtin functions.
So if support for multiple RDBMS is desirable, then testing will just have to
encompass those which are desired.  There will be minor annoyances, but that's
about the scope of it.

> I'm not so worried about buildmaster write performance, i.e., having
> db writes being blocking. Most masters don't write that often, and in
> those setups where they do, things like having the db master locally
> and let the web interface work on remote slave dbs or something sounds
> like the more standard way to attack that problem then to start doing
> bookkeeping asynch.

I'd be a _bit_ worried about this.  Compared to reads, writes might not happen
very often on the master, but when they do happen they still can't be allowed
to block the entire master.  Consider the normal case where twenty builds start
to test a new revision.  They all report their results within a relatively
brief period of time.  It's not a huge stretch to imagine the master falling
behind during this flurry of activity.  This might just be inconvenient (like
the waterfall view taking 10 seconds to render) or it might eventually lead to
real, serious problems (like the master being blocked for long enough that
slaves start to run into the idle timeout and disconnect).  Building in this
potential failure condition from the start when it's not necessary to do so
doesn't seem too useful to me.

> One thing that I found really hard to think about was trying to port
> the feature of incremental log display into a remote web front end. I
> figured one could have a specific back-door for this in the master,
> but I have no idea how to get the callback oriented data delivery
> ported over into something like django, which assumes to get the data
> all at once.

I assume the Django page render would just have to block until all the data had
arrived (writing it out as it became available).  From the Django side, I don't
think this would necessary look callback based.  Instead, Django would make
blocking calls to a `get_next_log_chunk' API repeatedly until there is no more
data.  This is a common strategy when blocking is expected.  The exact details
of how this might be implemented depend on how the master is making the logs
available to the Django process, but I don't expect it would be very difficult
(ie, just a matter of programming).

Dustin J. Mitchell wrote:

> I've been assuming that logfiles will be stored on-disk as distinct
> files, even with a DB.  My config.log for every build is 650K, which
> is a mighty big BLOB.

This definitely has advantages.  Relational databases are for relational data.
The one advantage to keeping logfiles in the database is that it simplifies
some transaction management.  I suspect that for buildbot, this isn't much of a
big deal, since even if you crash halfway through a build, you'd rather have
partial log results in the filesystem than to pretend the build was never even
attempted and discard all the results.

> This sync/async quesiton is *important*, but let's be clear about how
> much it would cost.  Currently, all of our "DB" operations (pickling,
> unpickling) are blocking.  Furthermore, our web-page creation is also
> blocking (the grid and waterfall can both take several seconds to
> render, during which time no interaction occurs with buildslaves, nor
> with other web clients).  I would very much like to get *away* from
> that particular mode of operation, but I think that the performance
> gains of a database over pickles would be significant even if the DB
> API was blocking.

The current pickle solution is pretty unpleasant, certainly.  Blocking use of a
real database will probably be faster than a lot of these in the average case.
However, local pickle access has the advantage that the upper bound on the cost
of an operation is pretty much fixed.  With a remote RDBMS, a bit more
uncertainty is introduced.  In most cases, there /probably/ won't be network
hiccups to worry about, but other clients may block the database for a while,
or the DBM may decide to re-arrange some storage, etc.  It will probably be
faster on average, but introduce the possibility for slower worst-case times.
In a traditional LAMP-style app, this isn't a huge concern, since it means
every once in a while one of your clients will have a somewhat worse
experience.  In a single-threaded application using the database blockingly,
it's a bit worse: everybody gets to wait for that query.

Now, even given that, I still expect using PgSQL blockingly will result in
better performance in practically all cases than the current pickle-based
solution, but it doesn't leave a lot of room for expansion.  As a master's
configuration gets more complex, as more slaves and builders are added, as more
views and more complex views of the data are requested, the disadvantages begin
to catch up with the advantages.  Pickle probably looked pretty good, too, when
the only installation of buildbot had only two slaves with one builder each.

So, if I were to implement this for buildbot, I'd use SQLite.  I'd use either
Storm or Axiom (because I'm familiar with them, not because of any intrinsic
advantage I'm aware of which they possess over the other offerings).  I'd
assume DB access is fast enough to do blockingly because I know the database is
on the local filesystem.  I'd keep all the SQL very simple and in a dedicated
storage layer, not spread out all over the buildbot codebase.  If I suddenly
realized SQLite wasn't good enough for some reason, then I'd use
twisted.enterprise.adbapi to talk to PgSQL (at least at first, worrying about
portability to other RDBMS later); this would mean async database access; but
again, any code using adbapi would be in a dedicated storage layer isolated
from all the other buildbot code.

Either of these solutions would ensure good performance, and neither is a
significant departure from the successful database-using Twisted applications
I've written in the past, so there would be little innovation, just
re-application of techniques I've already learned do work.

Jean-Paul