[Buildbot-devel] Periodic build problems after upgrade

Brian Warner warner-buildbot at lothar.com
Sun Nov 6 00:14:56 UTC 2005

> In 0.6.6 I had a periodicBuildTime set on all of my builds for about an hour
> and a half. Since each of the builds generally took about two and a half
> hours and I wanted essentially continuous builds, this seemed to be a
> reasonable compromise between having the periodic build timer almost always
> finding another build in progress and having it wait too long at the end of
> one build before starting the next.

Yeah, the new Schedulers are submitting BuildRequests, which get queued when
the target Builder is busy. These BuildRequests are specifically requests to
build the latest source code (i.e. in their SourceStamp, .revision=None),
which means they can be merged. This means that if the build takes an entire
day, and you have 20 new BuildRequests stacked up when it finally completes,
all of those BuildRequests can be merged together and handled with a single
build. This is how we avoid generating more work than we can ever hope to
actually finish.

> Turns out there was an uncaught exception (somewhere, I can find it if
> anyone's interested but I don't have the code on hand right now) that would
> cause the periodic build timer to not reset itself if it expired and
> attempted to start another build while one was already in progress. Once I
> fixed that everything was humming along nicely.

If this is still a problem in 0.7.0, let me know. The Periodic class isn't
supposed to have any idea whether the BuildRequest it submitted has completed
or not.

> The failing case generates this in the log:

Eww. Yes, you nailed the bug exactly: the Periodic scheduler is firing *in
the middle of a config-file reread*, before the Builder that it is pointing
at has been added. Your master.cfg is fine, the buildmaster shouldn't be
doing that sort of thing.

I think I need to change the way config-file reading works to not do a
startService to anything until we've finished handling the whole file. I've
run into a similar bug with things like Waterfall (involving two things
trying to listen on the same TCP port, where really it's a new, temporary
instance coming from the config file versus the existing instance from the
last time we read the config file).

A quick-hack fix would be to change BuildMaster.loadConfig to set up the
builders before it calls loadConfig_Schedulers, but really that's just
shifting the problem around and asking for it to cause a bug somewhere else.

Hrm. I have to think about this for a bit.


More information about the devel mailing list