[Buildbot-devel] Difficulties with concurrency and dependencies

Greg Ward gerg.ward+buildbot at gmail.com
Tue Nov 27 20:33:59 UTC 2007


I'm having problems shoehorning our build requirements into Buildbot, and I
want to bounce some ideas off the mailing list and see what everyone
thinks.  First, let me explain the requirements: our build produces a bunch
of Linux server-side applications (a mix of mostly Java and C++) and a
handful of client-side Windows apps (again, a mix of Java and C++).  So we
need at least two build slaves to handle the two different architectures.
But the builds on the two systems are completely different; only a handful
of core libraries are actually compiled on both build slaves.

Ultimately, all of our apps are deployed by being wrapped in RPMs and
installed on a Linux server.  (The Windows apps are downloaded by end users
from one of those Linux servers.)

The two builds break down fairly naturally into the following build steps:

(L1) checkout code needed for Linux build
(L2) preprocess/generate
(L3) upload results of step (L2) to master
(L4) build Java apps
(L5) build C++ apps
(L6) run all tests
(L7) bundle everything into RPMs
(L8) copy RPMs to internal build repository (production builds only)

(W1) checkout code needed for Windows build
(W2) checkout 3rd-party code (this is a separate build step mainly because
     it's a separate shell command)
(W3) download results of step (L2) from master
(W4) build Windows C++ apps
(W5) build installer for Windows Java apps (requires jar file built in
     step (L4))

Obviously, many of these build steps can be run concurrently, but there
are a couple of key synchronization points:
  * (W3) cannot run until (L3) has run
  * (W5) cannot run until (L4) has run (needs a jar file created by the
    Java build on Linux)
  * (L7) cannot run until (W4) and (W5) have run

Subtle point: there are actually a lot more dependencies than that, but
most of them are implicit in the linear sequences of (L1) .. (L8) and (W1)
.. (W5).  For example, (L2) trivially depends on (L1): I cannot generate
code until the checkout is complete.  However, (L5) does not depend on
(L4); I could run them in parallel, or in a different order, and everything
would be fine.  But (L6) does depend on (L5), because there are Java unit
tests that run executables created by the C++ build.  (Don't ask.)  I have
drawn what I think is the complete dependency graph; I think the easiest
way to express it in email is with 'make' syntax:

  L2: L1          # don't generate code until checkout complete
  L3: L2          # don't upload generated code until generation complete
  L4: L2          # don't build Java until generation complete
  L5: L2          # don't build C++ until generation complete
  L6: L4 L5       # don't run tests until everything compiles
  L7: L4 L5 W4 W5 # don't build RPMs until all apps (Linux and
                  # Windows) are built
  W3: W1 L3       # don't download generated code until it's uploaded and
                  # checkout creates build tree
  W4: W2 W3       # don't build Windows C++ apps until checkout done and
                  # generated code downloaded
  W5: W2 W3 L4    # don't build Windows Java app until checkout done,
                  # generated code downloaded, and Linux Java build complete

Still with me?  Great!  Now the hard part ... how do I explain all this to
Buildbot?

My first real attempt was with Dustin Mitchell's "triggers" patch (tickets
#56 and #57), which adds a new scheduler 'Triggerable', and a new build step
'Trigger'.  (It's a simple idea: a Trigger build step causes a Triggerable
scheduler to start running.)  Ultimately this fell apart because I have
various build steps that cannot run until *multiple* other build steps
complete.  Sadly, triggers go the wrong way: one Trigger step can fire
multiple Triggerable schedulers, but there's no way for a Triggerable to
block on multiple Triggers.  So that doesn't work.

So now I'm looking at the Dependent scheduler already in Buildbot.  Alas,
its docstring demonstrates that Dependent has exactly the same problem as
triggers: "This scheduler runs some set of 'downstream' builds when the
'upstream' scheduler has completed successfully."  But I need my scheduler
to wait until multiple upstream schedulers have all completed successfully.
Also, Dependent is kind of annoying in that it forces me to split my builds
up into many Builders, each with its own Scheduler.  For example, I would
have to break my two easy-to-understand sequences up into something like:

builder1: L1 L2 L3 (scheduled by scheduler1)
builder2: W1 W2 W3 (scheduled by scheduler2,
                    a Dependent that depends on scheduler1)
builder3: L4 L5 L6 (scheduled by scheduler3,
                    a Dependent that depends on scheduler1)
builder4: W4 W5    (scheduled by scheduler4, a Dependent that
                    depends on scheduler2 AND scheduler3)
builder5: L7 L8    (scheduled by scheduler5, a Dependent that
                    depends on scheduler3 AND scheduler4)

Breaking the Linux build up into 3 builders/schedulers means that I no
longer have the implicit L1->L2->L3->...->L8 dependencies.  And that in
turn means that builder4 must not run until scheduler2/builder2 AND
scheduler3/builder3 complete.  IOW, the fact that Dependent works at the
level of schedulers and builders, rather than at the level of build steps,
is precisely what makes Dependent not work for me!

So at this point I see two options:

  * modify the triggers patch so one Triggerable scheduler can await
    multiple triggers; this feels weird and unnatural

  * modify Dependent so it can depend on multiple upstream schedulers. This
    feels much more natural, but it's still annoying that all my
    synchronization points have to be at the coarse-grained level of
    schedulers and builders.  Eg. a minor tweak to the dependency graph
    could mean a major change to how build steps are organized into
    builders.

But wait!  A third way has occurred to me: a new type of build step that
(notionally) blocks until Something Happens.  For me, Something would just
be "each of these other build steps completes".  I can see something like
this:

  linux_checkout = steps.ShellCommand(name="linux-checkout", ...)
  linux_generate = steps.ShellCommand(name="linux-generate", ...)
  linux_upload = transfer.FileUpload(name="upload-generated", ...)
  linux_factory.addStep(linux_checkout)
  linux_factory.addStep(linux_generate)
  linux_factory.addStep(linux_upload)
  windows_factory.addStep(steps.Blocker(
      name="await-linux-setup",
      steps=[linux_checkout, linux_generate, linux_upload]))

If you have actually followed me this far, I'd be curious if you have had
similar problems, and how you've solved them.

Thanks --

        Greg




More information about the devel mailing list