[Buildbot-devel] Project dependencies, building branches, etc

Tue Jun 28 15:50:03 UTC 2005

>I think that, as a next step (for 0.7.0), the
>ChangeSource->Scheduler->Builders approach should provide for more
>functionality than what we've got in 0.6.x, and from there we 
>can figure out what the next direction needs to be. Experience 
>with projects that involve semi-independent sub-projects, like 
>yours and GStreamer, will feed these design decisions.

I agree. Speaking selfishly, our needs are straightforward, and I think the
Scheduler stuff will actually cover most of them. So the faster that gets
released, the better for us.

>The big question in my mind right now, as I write the code, is 
>how to represent the status of everything. It may be the case 
>that buildbot installations which do tricky clever things 
>involving multiple projects will also have to do tricky clever 
>things involving StatusTargets that provide useful displays of 
>build status. The existing Waterfall display is a 
>chronological view: as we get into BuildSets and BuildRequests 
>and multiple slaves per builder, perhaps we need a more 
>Change- or BuildSet- oriented view.

I had thought of a Scheduler being a refactoring of most of the
functionality of the Master. From that perspective, the most logical way to
represent this is to have a separate Waterfall for each Scheduler, and then
perhaps an overall view of some kind. That covers our simple case, but for
the Try and demand-driven branch build features, yeah, that would
be...harder. I don't understand the use cases well enough to make a good
suggestion.

>I'm not sure that I do prefer it. I think that we'll have the 
>flexibility to implement this sort of thing in multiple ways, 
>depending upon how much separation you want between your 
>Builders. Remember that each Builder is nominally independent, 
>so it may be hard to get compiled code from one to another.
>

[snip: gstreamer project description]

>A build of pygst can be described by a two-part SourceStamp: 
>the first part describes the version of libgstreamer that was 
>used, the second part describes the pygst code that was used. 
>You can imagine SourceStamps describing a variety of 
>combinations: ("0.8.1", "0.8.1") for two released versions, 
>("0.8.1", "r1234") for an SVN version of pygst compiled 
>against a released version of libgstreamer, etc. Different 
>goals might prompt you to want to validate different 
>combinations. You could imagine a step.Source subclass which 
>worked with named+released versions of projects instead of 
>their SVN repositories, so "0.8.1" could translate into an 
>instruction to download libgstreamer-0.8.1.tar.gz from some 
>HTTP server and unpack it. This would let you get into 
>Schedulers that watched mailing lists or freshmeat.net or 
>whatever, instead of VC repositories. The functionality this 
>could add would mostly be adding corners to the "four corners" 
>testing matrix: latest SVN of A against latest release of B, etc.

I don't know enough about the whole Change/Stamp setup to make much
contribution here, but as far as I'm concerned, this isn't BuildBot's
problem. I guess the goal here is to support some kind of "redo", where you
plug in a SourceStamp for it to use and it goes out and pulls the old
versions of everything. That sounds nice, I guess, but it's not a huge deal.
My main concern here is to know what went into the build, and be able to
reproduce it manually if necessary. And I can look back at the logs of the
gstreamer build to find out what was current at that time, so all I need are
the build logs from each project.

>
>But, for our purposes, we'll generally be building ("r1234", 
>"rHEAD"), where
>r1234 is the latest known-working revision of libgstreamer.

The only scenarios I can think of are this, and building against the latest
release. Both are "latest", so I don't see what good specifying a Stamp for
the dependency is doing you...?

>  The pygst folks have their own buildbot. It watches the libgstreamer
>  buildbot to find out when a new (working) -rHEAD of libgstreamer is
>  available. There are two situations that prompt it to rebuild pygst:
>  libgstreamer has been updated, or pygst has changed. If the only
>  information it gets from the libgstreamer buildbot is the 
>revision number
>  of the tree that built successfully (that is, we're not downloading
>  binaries or anything), then it will somehow need to fetch 
>and compile its
>  own copy. They can do this by just copying a Builder config from the
>  "upstream" libgstreamer buildbot, so that Builder A does an 
>SVN checkout of
>  libgstreamer and compiles it normally.

I don't think it matters how the downstream build gets a copy of a build of
HEAD. Whether that's by checking it out and building it or downloading it
and unpacking it, the process is going to be the same.

>  I'm envisioning three Schedulers in this setup. The first 
>subscribes to the
>  upstream BuildMaster and just triggers a local libgstreamer 
>build each time
>  the upstream build succeeds. The second watches the pygst 
>SVN repository
>  for Changes and triggers a local pygst build each time something has
>  changed. The third watches the local libgstreamer build and 
>triggers a
>  local pygst build each time it succeeds.

Why three? I thought there would be one Scheduler triggered by upstream
commits, and one triggered by a gstreamer build or by a pygst SVN change.

>  The advantage of this approach is that the upstream library 
>is compiled
>  exactly once per upstream code change. The disadvantage is 
>that you have to
>  arrange for a well-known directory to be shared between the 
>two Builders. I
>  want to find a clean way of expressing this shared 
>directory, because it
>  imposes more restrictions on the buildslaves than we 
>currently have. (at
>  present, the slave admins only have to provide one base 
>directory, and the
>  buildslave takes care of everything inside that.. 
>furthermore, each Builder
>  is independent). Another disadvantage is that you need a 
>Lock of some sort
>  to keep the pygst builder from running while the 
>libgstreamer builder is
>  running, otherwise you'll be linking against half-compiled (or
>  half-installed) code.

I intend to address this by scp'ing to a known host, because I'm going to be
building all the libraries I depend on and that's the easiest way I can
think of to do the distribution. This also neatly solves the locking
problem, as the upload process can be considered atomic (kind of, maybe copy
the new one into place or something). A project that depends on another
project that publishes continuous builds won't need this at all. I don't
think this is a problem worth your time.

>
>  Note that this scenario works the same way if there's only 
>one buildbot.
>  The point is that the binaries being compiled by one Builder are used
>  directly by a separate Builder.
>
> Scenario 2: both components in one Builder

[snip]

>  This has the advantage that each Builder is independent, and 
>can run on any
>  qualified slave, no file systems need to be shared. It also 
>has no need for
>  Locks of any sort between the separate Builders. The 
>disadvantage is that
>  the libgstreamer code is being compiled multiple times (at 
>least multiple
>  times per libgstreamer revision).

This hits me where I live. I don't want to do two builds of the same product
for no reason: it slows down the feedback loop. I would much rather set up a
common file store, which isn't really that hard anyway.

>
>So, to support the second scenario, we'd need some 
>improvements in the way that SourceStamps are expressed 
>(specifically, the ability to express more than one 
>sub-project's revision at the same time). To support the first 
>scenario cleanly, I'd want a way to allocate and share a 
>directory between Builders, as well as a way to express the 
>restriction that they always run in the same buildslave, plus 
>some sort of Lock to avoid compiling against half-installed libraries.
>
>There's work to be done before we can do either. You could 
>probably implement the first scenario today, if you put in 
>some absolute pathnames for 'workdir'. You could also probably 
>implement the second scenario today, if you only ever wanted 
>to build -rHEAD.
>
>
>Hmm, ok, that got a bit verbose. Hopefully it explains what I 
>was thinking better.

Yeah. Probably it's clear by now that my vote is for the low-hanging fruit.
I would much rather have Scheduler support in a month than, really, any of
the above. I don't think you really need to protect people from shooting
their feet off.

> subscribe to new Builds
> retrieve build[-1]
>  examine SourceStamp of that build, compare against the last 
>known build
>  if different, trigger a new build with the build[-1] sources 
> when new builds arrive, if they were SUCCESS, trigger a new build

Cool. I might try and write that.

> each Step could declare a set of named Locks that it wants to 
>acquire before  starting. These names are either scoped to the 
>buildslave or to the  buildmaster as a whole (perhaps with 
>names like "slave.using_database" or  
>"master.running_benchmarks"). Slaves could request multiple 
>names, but will  yield all Locks at the end of each step (to 
>avoid deadlock).

I would prefer scoped to the Master. If you scope it to the Slave, you're
implicitly assuming that the only resources you care about are local to the
machine, which isn't the case here; databases are the most obvious but there
are more.

> there could be special GetLock and ReleaseLock steps that 
>you'd insert  between your regular steps. This has the 
>possibility of deadlock, but would  also let you achieve the 
>flexibility of both Step-wise and Build-wise Locks.

But I don't want to see a Get/Release in my waterfall :p

>I can't currently imagine a scenario where you'd want to be 
>able to lock a whole build, but I wouldn't be surprised if 
>there were one out there.

I can't think of a good reason you would need this, either.

>The "lock pygst while we're 
>installing libgstreamer into ../installed" use case from 
>Scenario 2 above could be safely accomplished with just a 
>single lock that was acquired by both the libgstreamer's 
>install step and the pygst's configure/compile step. If the 
>configure and compile were in separate steps, then you'd need 
>either multi-step Locks or whole-Build Locks to be safe.

I would do this as: build, compile, lock, install, unlock. I'm sure that
there are uses where it would be nice to have lock across steps, but this
isn't one. I would keep it to a single step for now and expand if necessary.

>My target for the Scheduler stuff is to get it done in the 
>next month. The basic logic is done, but I haven't written the 
>unit tests, or updated the status targets to handle the brave 
>new world (multiple slaves per builder, dealing with BuildSets 
>and BuildRequests, somehow showing the status of a Scheduler 
>sanely [sheesh :-]). I think that target timeline will include 
>Locks and Dependencies, as well as Scheduler variants that can 
>watch remote buildmasters. If that slips, it will be because I 
>really want to get the 'try' feature into this revision as 
>well, but with all the other pieces in place I don't think 
>that will be too hard to implement.

Cool. I wanted to take a look at your Arch repository, but I have to build
bazaar first (which I can't seem to do) so that's on hold for a while.

Thanks (and sorry it took a little while),
Michael