[Buildbot-devel] Sending link to specific failed buildstep

Brian Warner warner-buildbot at lothar.com
Sat Oct 14 20:22:13 UTC 2006


> the new system had all our 19 build systems hammer our Subversion server at
> once with a bunch of checkouts, and this resulted in some ssh timeouts.

Yeah, at work I put a MasterLock around the SVN checkout step, with
maxCount=2 because our SVN server isn't *that* slow (note that maxCount>1 is
not in 0.7.4, but will be in the upcoming 0.7.5). This tends to be a little
bit too conservative, because the SVN step in mode="copy" actually has two
phases: the SVN update itself, and the copy-from-source/-to-build/ phase. The
former is what you need to limit the concurrency on, the latter is entirely
local to the buildslave machine and doesn't need to be limited. However,
smallest piece you can apply the Lock to is the whole Step. Also, for our
buildslaves, the copy phase takes way longer than the SVN update, so we end
up slowing down the builds by waiting for things we don't need to. The total
delay is no more than a few minutes, though.

> So yesterday, we redid this by creating 'Nightly' schedulers that were
> specific to a build system, and staggering them by a minute (so the first
> scheduler started at 0023, the last one started at about 0041).

I've been wondering if it might make useful to have the Nightly and Periodic
schedulers include a configurable amount of random dither, so that you could
use the same one for a lot of different Builders but they'd all go off at
slightly different times. We have three Schedulers that each fire every 20
minutes to perform some network-viability testing, and there's a bit of a
thundering herd problem because they all get started at the same time, so
they all fire at the same time.

You could either have a collection of Schedulers that each wait independently
for, say, 20 minutes +/- 2 minutes, or you could arrange for a single
Scheduler that fires multiple Builders to fire them at randomly-chosen times
within a configurable window.. so something like:

 s = Periodic(name, ["A", "B", "C"], periodicBuildTimer=20*60,
              perBuilderPlusMinus=60)

would mean that there would be a two-minute-wide window centered around +20
minutes (+19..+21), and each of the three builders would get fired at a
randomly-chosen time within that window. (or maybe use "dither" instead of
"plusminus" to indicate the total size of the window). Instead of
"perBuilder", a separate argument might mean that you want all builders to be
fired at the same time, just that this "same time" should vary from exactly
every 20 minutes. The Nightly scheduler could have the same arguments, so
that when you say to build at 4:23am, you really mean pick a random time
uniformly chosen from between 4:21am and 4:25am.

> We're now into the 'enhancement' phase, including the CLI tool we'd like to
> create to be able to view status, kill builds, and forcefully restart them
> (so we can do "restart builds on all platforms for this product," e.g.).

Take a look at the 'buildbot statuslog' command as raw material for
interacting with the PBListener-backed remote status port. I started working
on a 'buildbot force' command a few months ago but got distracted in the
details of trying to make it use the same sort of command mechanism as
'buildbot try'. 'buildbot kill' should probably use the same code.

> One of the enhancements that has been requested is to have failure emails
> include a link specifically for the log for the step that failed, rather
> than just a link for the failed build.  Anyone have any suggestion what I
> should look at to make this happen?

Yeah, the main Status object has a method called getURLFor(thing): you give
it one of the secondary status objects (like a BuildStepStatus object) and it
gives you back the URL of a page for that particular object. In your
MailNotifier subclass, probably in buildMessage, do something like:

 for step in build.getSteps():
     results, strings = step.getResults()
     if results == builder.FAILURE:
         stepname = step.getName()
         url = self.status.getURLFor(step)
         text += "step %s failed (%s)\n" % (name, " ".join(strings))
         text += "see <%s> for details" % url


The per-step pages could probably use some improvement.. I'm not sure there's
a huge amount of detail on them right now. html.StatusResourceBuildStep is
the class that generates that page.

cheers,
 -Brian




More information about the devel mailing list