[Buildbot-devel] Project dependencies, building branches, etc

Tue Jun 14 19:52:55 UTC 2005

> A master is specific to one project. If you want to build more than one
> project, you create more than one master. This is because there's no way to
> filter changes, so if you put more than one project on one master, changes
> to one will trigger a rebuild of both.

At the moment I recommend one master per VC repository. You can use the
isFileImportant method to filter out changes for subprojects. The lack of
branch support in 0.6.6 may mean you need one master per branch.

> Once the b-o-b feature is finished, there will be a way to associate
> builders with a particular branch, and Changes will trigger only the
> builders for their branch.

To be precise, there will be a way to associate Schedulers with a particular
branch. Each Scheduler can then trigger builds on multiple Builders (all
using the same source code). The syntax will look like this:

  linux_builder = {'name': 'linux', 'slavename': 'bot-linux',
                   'builddir': 'workdir', 'factory': f}
  ...
  c['builders'] = [linux_builder, solaris_builder]
  s1 = Scheduler(name="Trunk Builder", branch="trunk", treeStableTimer=5*60,
                 builders=["linux", "solaris"])
  c['schedulers'].append(s1)

  # this will fire builds of the trunk on two platforms after 5 minutes

> Furthermore, slaves can be connected to only one master. This means that
> you need #projects * #platforms slaves.

True. The Scheduler change should reduce the need for multiple masters (at
least for multiple branches). I am also thinking that we can share slaves
between multiple buildmasters, and I will probably implement this eventually,
but I'd prefer solutions that don't gratuitously require multiple masters.
One master per project feels like the right target to aim for. The vague
long-term goal I have is for one slave per platform, one Builder per process,
one BuildMaster per project. (however the current Scheduler work is leading
us to one Builder per process*platform, and one BuildFactory per process).

Another possible direction is to put multiple buildmasters in a single (unix)
process (with multiple config files), and then allow the status targets to
subscribe to whatever subset of those masters they wish. Hmm.

> There is no simple way to handle project dependencies.

> Dependencies. When we commit to our utility library A, we want our project B
> to be updated to use it. Some projects are tightly integrated and should be
> tested in concert this way.

Correct. The goal is to use Schedulers for this purpose. The default
Schedulers pay attention to Changes, but you could write one that would
instead subscribe to another buildmaster and watch for builds of the
dependencies to be completed. To make this useful, we need to add a few
things:

  at the moment, Builds are created with a single "SourceStamp" that defines
  what code they ought to checkout/compile/test. We should expand this to a
  dictionary of components, so the Scheduler can tell the Build that it
  should use -rHEAD of the main project's code, but use -r1234 or whatever of
  libfoo, because the Scheduler just got notified by the libfoo buildmaster
  that r1234 has passed its own unit tests.

  you'd then need to add Steps to the build process that would compile the
  dependencies first, then use them to build the main target. Really, you'd
  like those component builds to only be performed when their sources had
  changed, so we'll need to figure out some rational directory scheme that
  will allow these pieces to be updated and built separately. I'm thinking
  that we'll have a checkoutdir and a workdir for each component. This gets
  complicated.. it is a lot like a separate Build (with its own Steps), and
  I'm not yet sure how to do this best. It may involve some cooperation
  between multiple Builders (one for each component, plus one for the main
  project that uses those components), but I'd like the Build to be the place
  where information gets exchanged, rather than the Builder. Definitely needs
  more thought.

  it would probably be useful to add a "download tarball and unpack" Step, so
  that these components' versions could be described with a URL pointing at
  that tarball. This would be most useful in an environment where the
  dependency project's buildbot is creating source tarballs whenever the
  tests pass, but you could also imagine a Scheduler which subscribed to,
  e.g., the gnome-announce mailing list and watched for new versions of
  libgtk, pulled out the URL, then triggered a build against the new library.
  To actually make this useful, you'd probably want to keep the compiled
  libraries around from one Build to the next, so again you'd need to get
  clever with the directories you're using.

Another point to be aware of is that the Schedulers, when they ask the
Builders to run a build, can also ask about the resulting status of those
builds. They can use this to remember which changes worked and which didn't,
or to re-run failing Builds if it thinks that the problem might be due to a
transient timing-related test failure, or to trigger other builds only if the
first build passed. There is a lot of flexibility here.

> Another approach would be to modify BuildBot itself to have a notification
> when any build finishes; this could then be used to report to another
> instace, which would let its Builders decide which, if any, projects should
> be rebuilt.

Yup, the buildbot.status.client.PBListener target is the output of this, and
an as-yet-unwritten Scheduler is the input.

> There is no existing way to aggregate masters' results together. This could
> be written relatively easily using the status client APIs to create a web
> app that connected to masters to query their status.

That's the approach I'd prefer, at least for aggregating the status of
disparate projects. I'm trying to make sure the PBListener interface gives
you remote access to everything that the normal IStatus interface provides
locally.

> Single master, multiple projects.

With Schedulers, I think this should be possible.

> Better branch support. I've read the diary posted to this list a while back;
> I'm not sure I understand it properly, but my impression is that Builders
> can flip between building HEAD and branches arbitrarily. We need builders
> configured for each active branch full-time, so that they can build the
> appropriate branches when a change comes in.

In the code I'm writing now, each Builder is willing to compile/test code
from an arbitrary branch: each Build is created with a SourceStamp that can
say "use the latest code on the FOO branch", or "use revision 1234", or "use
the earliest revision that contains Changes [c1,c2,c3]". Everything is
controlled by that SourceStamp.

The Schedulers determine which SourceStamps are submitted to which Builders.
The "use one Builder and interleave Builds of all branches" behavior is
obtained by having multiple Schedulers (one per branch) all feeding into the
same Builder. The "use one Builder per branch" behavior you desire would be
obtained by having a 1-to-1 relationship between Schedulers and Builders. In
that approach, each Builder would only ever compile code from a single
branch. (note that those multiple Builders could all share the same
buildslave: the slaves keep each Builder in a separate directory).

> My proposals:
> 
> 1) Allow more than one project in a single Master. Support this by adding a
> key to each ChangeSource, and letting each Builder define a list of relevant
> keys. Once this is done, it would be straightforward to modify the waterfall
> display to take a list of builders to display. This would allow for
> customized views.

Apart from what I've described already, here are some other useful points:

  Changes (in the upcoming release) have a .branch attribute, which is
  examined by the Scheduler to see whether it should pay attention or not.
  The ChangeSource is responsible for provide a value for this attribute,
  which is easier to accomplish in some VC systems than others (in
  particular, with Subversion you have to provide a function that will split
  the full URL into a branch and a filename, since different projects
  organize their branches differently). When a SourceStamp is constructed
  from a list of Changes, they must all be on the same branch. The
  SourceStamp is used by the step.Source checkout command to figure out what
  revision to checkout. These checkout Steps also have a default branch name
  to use when the Changes didn't provide one.

  The Waterfall class currently has an (undocumented) feature to restrict the
  display to a subset of Builders. I think you append
  "?builders=one,two,three" to the URL to see only those three Builders..
  check the source code to be sure.

  The HTML status displays need some significant updating, as the Scheduler
  work breaks many of the assumptions around which the Waterfall class was
  built. You can now have multiple slaves per Builder (allowing parallel
  builds), the status of a BuildSet can be tracked separately from the Builds
  that make it up, there needs to be a place to display the Scheduler's
  activity, etc.

> 2) Better branch support

This should be supported by the Schedulers. The default Scheduler class only
pays attention to a single branch, but you can easily write one which looks
at multiple branches, or pays attention to everything *except* a single one,
etc.

> 3) Add a post-build hook to the master so that dependents could be notified
> of a build. Alternatively, I could just subclass BuildMaster.

The "right way" to do this is to add a PBListener status target to the
upstream buildmaster, and then write a Scheduler which subscribes to that
target for use on the downstream buildmaster. There will be a base Scheduler
class to do this sort of thing, some kind of example at least.

Also note:

 the old Interlock object is going away in the next release. In its place are
 two new objects:

  "Lock" (or maybe "Semaphore" or something), which handles temporal
  exclusivity: a Step or a whole Build can exclude other Steps/Builds from
  using certain resources at the same time. This would generally be used to
  keep control of the CPU load on a given machine, or to avoid running two
  copies of the same test suite at the same time (although any test suite
  which needs this sort of semaphore should probably be fixed).

  Dependency, which hooks together multiple Schedulers, and makes sure that a
  given set of Changes work in one place before being used in a second place.
  I'm still figuring out the details, but I'm thinking the syntax will look
  like this:

    s1 = Scheduler(builders=["quick"])
    s2 = Scheduler(builders=["full-linux", "full-solaris"], dependencies=[s1])
    s3 = Scheduler(builders=["make-tarball"], dependencies=[s2])
    c['schedulers'] = [s1, s2]

Let me know if this all sounds like it will fit your needs. The whole
Scheduler thing is still a work in progress, and I want to make sure it
solves these sorts of problems.

cheers,
 -Brian