[Buildbot-devel] Project dependencies, building branches, etc
warner-buildbot at lothar.com
Wed Jun 22 07:09:13 UTC 2005
> Plus, it didn't seem like anyone had given you a solid use case (at least
> not on list) for multiple projects, so I thought I could maybe be of use.
Definitely. As the code shapes up, keep taking a look at the design and let
me know how it does or does not handle your needs.
> After reading this I'm not sure what you're trying to accomplish with the
Well, to be honest, it's in flux.. I realize that I don't yet know enough to
be sure that I've got a good design.
You can think of ChangeSources, BuildMasters, Schedulers, Builders, and
Slaves all as nodes in a big interconnected graph. Earlier versions of the
buildbot had fairly rigid limitations on how they could be interconnected.
(specifically the Scheduler functionality was inseparable from the Builder,
and all ChangeSources fed into all Builders). Over time, as we recognize
those somewhat arbitrary limitations, we remove them, and the graph becomes
more flexible. ChangeSources can map arbitrarily to Schedulers, Schedulers
can map arbitrarily to Builders. With PBStatusListeners and custom Scheduler
classes, it doesn't matter quite as much whether two nodes happen to be in
the same host process or not.
The BuildMaster is coded in such a way that it doesn't specifically refuse to
share a process with another BuildMaster instance. A little bit of work
(mostly involving how to share TCP ports between the two) would make this
possible. I'm not saying that it's a good idea, but there's no reason to make
the job of (some day) moving to such a scheme any harder than it needs to be.
I think that, as a next step (for 0.7.0), the
ChangeSource->Scheduler->Builders approach should provide for more
functionality than what we've got in 0.6.x, and from there we can figure out
what the next direction needs to be. Experience with projects that involve
semi-independent sub-projects, like yours and GStreamer, will feed these
The big question in my mind right now, as I write the code, is how to
represent the status of everything. It may be the case that buildbot
installations which do tricky clever things involving multiple projects will
also have to do tricky clever things involving StatusTargets that provide
useful displays of build status. The existing Waterfall display is a
chronological view: as we get into BuildSets and BuildRequests and multiple
slaves per builder, perhaps we need a more Change- or BuildSet- oriented
> I guess what you're suggesting is that the B build do all the steps for A
> as well, but be triggered by A building successfully. But this introduces
> extra complexity, as you pointed out. Why do you prefer this way?
I'm not sure that I do prefer it. I think that we'll have the flexibility to
implement this sort of thing in multiple ways, depending upon how much
separation you want between your Builders. Remember that each Builder is
nominally independent, so it may be hard to get compiled code from one to
Let's try for a concrete example. If Thomas will forgive my complete
mischaracterization of his project, let's say that GStreamer has two big
pieces: the (C-based) GStreamer library called libgstreamer, and the python
bindings called pygst. The python bindings depend upon the core library.
Let's pretend that they have separate SVN repositories, and that changes are
showing up in both all the time. We want to make sure that -rHEAD of
everything remains in good working order.
The pygst build process normally compiles the python bindings against the
pre-compiled libgstreamer header files and libraries that are installed on
the system (in /usr/include and /usr/lib). To do something other than that
requires some build-time flags (say, ./configure
A build of pygst can be described by a two-part SourceStamp: the first part
describes the version of libgstreamer that was used, the second part
describes the pygst code that was used. You can imagine SourceStamps
describing a variety of combinations: ("0.8.1", "0.8.1") for two released
versions, ("0.8.1", "r1234") for an SVN version of pygst compiled against a
released version of libgstreamer, etc. Different goals might prompt you to
want to validate different combinations. You could imagine a step.Source
subclass which worked with named+released versions of projects instead of
their SVN repositories, so "0.8.1" could translate into an instruction to
download libgstreamer-0.8.1.tar.gz from some HTTP server and unpack it. This
would let you get into Schedulers that watched mailing lists or freshmeat.net
or whatever, instead of VC repositories. The functionality this could add
would mostly be adding corners to the "four corners" testing matrix: latest
SVN of A against latest release of B, etc.
But, for our purposes, we'll generally be building ("r1234", "rHEAD"), where
r1234 is the latest known-working revision of libgstreamer.
Let's describe two scenarios:
Scenario 1: separate builders
libgstreamer is maintained by a separate group than pygst, and they've each
got their own buildbot setup. The libgstreamer folks only pay attention to
the C code, and ignore the python bindings. The libgstreamer buildbot
watches SVN for libgstreamer changes, compiles them, and runs tests. It
publishes build status on a TCP port via PB for anyone who chooses to
The pygst folks have their own buildbot. It watches the libgstreamer
buildbot to find out when a new (working) -rHEAD of libgstreamer is
available. There are two situations that prompt it to rebuild pygst:
libgstreamer has been updated, or pygst has changed. If the only
information it gets from the libgstreamer buildbot is the revision number
of the tree that built successfully (that is, we're not downloading
binaries or anything), then it will somehow need to fetch and compile its
own copy. They can do this by just copying a Builder config from the
"upstream" libgstreamer buildbot, so that Builder A does an SVN checkout of
libgstreamer and compiles it normally.
I'm envisioning three Schedulers in this setup. The first subscribes to the
upstream BuildMaster and just triggers a local libgstreamer build each time
the upstream build succeeds. The second watches the pygst SVN repository
for Changes and triggers a local pygst build each time something has
changed. The third watches the local libgstreamer build and triggers a
local pygst build each time it succeeds.
Now, when compiling pygst itself, you have to be able to point it at some
libgstreamer.a and libgstreamer.h files. This is the part where you have to
violate Builder isolation. Builder A needs to install the compiled
libgstreamer somewhere well-known. Builder B needs to use a
--with-gstreamer argument that reads from this well-known location. The
slaves attached to these builders need to share a filesystem.. they should
probably both run in the same slave. Performing cross-platform testing will
involve pairs of Builders.
The advantage of this approach is that the upstream library is compiled
exactly once per upstream code change. The disadvantage is that you have to
arrange for a well-known directory to be shared between the two Builders. I
want to find a clean way of expressing this shared directory, because it
imposes more restrictions on the buildslaves than we currently have. (at
present, the slave admins only have to provide one base directory, and the
buildslave takes care of everything inside that.. furthermore, each Builder
is independent). Another disadvantage is that you need a Lock of some sort
to keep the pygst builder from running while the libgstreamer builder is
running, otherwise you'll be linking against half-compiled (or
Note that this scenario works the same way if there's only one buildbot.
The point is that the binaries being compiled by one Builder are used
directly by a separate Builder.
Scenario 2: both components in one Builder
The pygst buildbot has only the one Builder, which is responsible for
compiling both the "upstream" libgstreamer library and the "downstream"
pygst bindings. There are two Schedulers: one to watch the upstream
buildbot, and a second to watch for pygst SVN changes. The build process
looks something like the following:
checkout libgstreamer code of the given revision, into ./upstream
cd upstream && ./configure && make && make install --prefix ../installed
checkout pygst code of the given revision into ./downstream
cd downstream && ./configure --with-gstreamer=../installed && make
cd downstream && make check
This has the advantage that each Builder is independent, and can run on any
qualified slave, no file systems need to be shared. It also has no need for
Locks of any sort between the separate Builders. The disadvantage is that
the libgstreamer code is being compiled multiple times (at least multiple
times per libgstreamer revision).
So, to support the second scenario, we'd need some improvements in the way
that SourceStamps are expressed (specifically, the ability to express more
than one sub-project's revision at the same time). To support the first
scenario cleanly, I'd want a way to allocate and share a directory between
Builders, as well as a way to express the restriction that they always run in
the same buildslave, plus some sort of Lock to avoid compiling against
There's work to be done before we can do either. You could probably implement
the first scenario today, if you put in some absolute pathnames for
'workdir'. You could also probably implement the second scenario today, if
you only ever wanted to build -rHEAD.
Hmm, ok, that got a bit verbose. Hopefully it explains what I was thinking
> It also raises the question: if a listener is offline when an event
> happens, does it ever get notified? I think it would be simpler in general
> for it to be part of the core process, but it's sounding like it would be
> easy to extend Scheduler to do whatever I wanted after a build finished.
The Status interface lets you subscribe to hear about new Builds finishing,
and also lets you ask about earlier Builds. So the correct sequence would be:
subscribe to new Builds
examine SourceStamp of that build, compare against the last known build
if different, trigger a new build with the build[-1] sources
when new builds arrive, if they were SUCCESS, trigger a new build
> [using Locks for resource allocation]
> Obviously, if the different projects aren't talking to each other, that
> doesn't work. I can see uses for this for making sure performance tests run
> cleanly, making sure that tests that use a database can share the same one
> without stomping on each other, etc, etc.
Of course. We could build inter-buildmaster Locks, but I think that would be
a bad idea.
I'm still waffling on what the the Lock semantics ought to be (as opposed to
the Dependency stuff, which is easier because it only affects the Scheduler).
Some of the possibilities:
each Step could declare a set of named Locks that it wants to acquire before
starting. These names are either scoped to the buildslave or to the
buildmaster as a whole (perhaps with names like "slave.using_database" or
"master.running_benchmarks"). Slaves could request multiple names, but will
yield all Locks at the end of each step (to avoid deadlock).
each Build could declare a set of Locks that it must acquire before
there could be special GetLock and ReleaseLock steps that you'd insert
between your regular steps. This has the possibility of deadlock, but would
also let you achieve the flexibility of both Step-wise and Build-wise Locks.
I can't currently imagine a scenario where you'd want to be able to lock a
whole build, but I wouldn't be surprised if there were one out there. The
"lock pygst while we're installing libgstreamer into ../installed" use case
from Scenario 2 above could be safely accomplished with just a single lock
that was acquired by both the libgstreamer's install step and the pygst's
configure/compile step. If the configure and compile were in separate steps,
then you'd need either multi-step Locks or whole-Build Locks to be safe.
I'd love to have a syntax for this that didn't make it possible to deadlock.
Trying to obtain a Lock while you already have one held is the primitive that
must be prohibited to avoid this.
> >Let me know if this all sounds like it will fit your needs.
> >The whole Scheduler thing is still a work in progress, and I
> >want to make sure it solves these sorts of problems.
> It sounds like it will definitely cover our needs, and then some.
> $64K question: Any idea of a timeline on this?
My target for the Scheduler stuff is to get it done in the next month. The
basic logic is done, but I haven't written the unit tests, or updated the
status targets to handle the brave new world (multiple slaves per builder,
dealing with BuildSets and BuildRequests, somehow showing the status of a
Scheduler sanely [sheesh :-]). I think that target timeline will include
Locks and Dependencies, as well as Scheduler variants that can watch remote
buildmasters. If that slips, it will be because I really want to get the
'try' feature into this revision as well, but with all the other pieces in
place I don't think that will be too hard to implement.
hope that helps,
More information about the devel