[Buildbot-devel] Facing some possible challenges with BuildBot, opinions requested

Thu Oct 25 23:35:49 UTC 2007

Folks,

Hopefully this email won't get to be too long, but I'm in a bit of a
situation that I think I'm in need of outside suggestions on.  There's
a desire to make buildbot do a bit more in our company than it's cur-
rently doing, including handling release builds (doing the grunt work
for us) and short-term release branches.  These changes are requiring
me to look at upgrading buildbot to the latest version for effective
load balancing (to minimize times), but leads me to wonder if I might
hit some problems when moving with some of the customization we do in
our current configuration (running off of version 0.7.4).  So I turn
to the mailing list for some possible assistance here...

I'm going to try and describe my current configuration; if it's insuf-
ficient, I will need to try to 'desensitize' it so I can post it here.
To start off, we have a tool that's used to manage and assist in the
building of our products, which are relatively complex due to a deep
level of dependencies.  The tool handles updating what we call a 'build
root', along with actually performing the builds; it's written on top
of SCons which I suspect many here are familiar with.  Due to this, our
use of buildbot is rather unique in that we create builders that simply
call out a set of shell commands to the tool to perform the various
steps in a build.

The current breakdown of the steps are as follows:

     - update buildroot
     - perform build of product
     - copy resulting tarball/package to an archive area
     - run regression tests

The first two steps are handled by shell commands utilizing the tool,
and the final two steps are handled by shell command calling out various
scripts (Python, Perl and shell).

Since there's sufficient variation in each of the products as to how
it's built and tested, our buildbot configuration creates a set of slave
objects that hold various pieces of information for each of the build-
slaves.  Each object contains the following:

     - The name of the buildslave
     - A dictionary of the products and which platforms they're built on
     - A dictionary of the versions supported for each product
     - A dictionary for current release builds (if any) of any products

There's also some variables to note whether the buildslave is a contin-
uous build system (a 'normal' mode for buildbot) or a special bench-
marking system.

Once the slaves objects are created, a SlaveLock is created for each
buildslave as our tool effectively "locks" the buildroot on a system
which prevents simultaneous update or build processes from occurring
(which is a good thing in our case).  A build factory is then generated
for each product supported under each buildslave; given the information
in the slave object it determines what kind of build needs to be done
for a given product and then instantiates a class which generates the
appropriate build steps for that product, version and release type.

After all the builders have been created, schedulers are created.
There are three kinds of schedulers in our configuration:

     - nightly builds (run at a given time each evening)
     - continuous builds
     - benchmarking builds (dependent on key nightly builds)

The start times for the nightly builds are staggered by a minute each
to prevent any possible repository access contention (which happens
now much less with Subversion than it did with CVS, but still can occur
if too many systems try to update for the same product simultaneously).
The benchmarking builds are only run if certain builds on a given plat-
form are successful (both in the builds and with the testing).

The final part of the configuration sets up the status targets, of which
there currently is both web and mail notification.  For the web side,
there are five different interfaces:

     - waterfall (the default)
     - grid (special overview page of all builders)
     - text (to be used by command-line tools)
     - rss (for those who want RSS feeds)
     - buildtimes (a view of how build times have changed over time)

All but the waterfall display are custom interfaces.  As a final note,
the debugging port is also enabled (but rarely used).

Currently this all works just fine, but with the additional builders
being created before each new release (along with the release itself),
things might get a bit hairy; we already have 167 builders and the
number will grow significantly with this change.  I have a text file
that is read by the configuration that allows different product builds
to be easily added and removed from buildbot, but keeping track of
things may get painful once these changes are put into effect.

To cover one issue I'm dealing with, the grid interface via the web
was created so we could have a complete overview of all our builders
that fit mostly onto a single page, unlike the waterfall page which
could scroll for a long time with 167 builders.  The grid basically
has a set of rows to represent each build slave and a set of columns
to represent each product (and a given version of said product), with
the grid itself filled in with information for each pair that actually
has an associated builder.  With the upcoming increase in slaves and
product versions existing at a given time in the configuration, how-
ever, scrolling might become an issue again.  One idea I've come up
with is to modify the grid view so that load balanced rows could be
collapsed into a single row, and expanded only when more detail is
needed.  This should work fine since a given builder would only be run
on a single slave at any given time; the expansion would be to see
where each builder ran.  A similar thing could be done for the columns
for each product, but that runs into the issue where there will be
more than one builder represented in each row in the collapse, so I'm
uncertain on the best way to handle it.  It may end up being a simple
hiding mechanism for the non-nightly columns that can be expanded when
they need to be looked at, but there should still be a way to know if
there's an issue in a hidden column.  Suggestions here are welcome.

The other issue that's concerning me is I'm uncertain how attempting
to use load balancing will affect the way my current configuration
is constructed; I do realize that more detail on the actual configu-
ration might be needed for anyone to truly answer that, though.  Part
of me can't help but wonder if there's a "simpler" way to do what
we're doing, but basically I've been working from a configuration
that was designed by someone before me; I've completely refactored
their code in master.cfg to make it easier to maintain, but the addi-
tional code in the custom html.py file (and a few other files) still
need a lot of work, and it might end up being a challenge but com-
pletely necessary with the upgrade.

I think that covers the current state of what I've considered; if any
of this is unclear, folks, please let me know and I'll do my best to
clarify or expand on what I've already said here.  I'm pretty certain
that if I do need to post my configuration I will be able to do so,
it will just take a bit of work, but if needed for further help I am
willing to undertake that.  So... anyone have any suggestions on how
I should proceed here, perchance?

Ken Lareau