[Buildbot-devel] Periodic build problems after upgrade

Mon Nov 7 14:34:04 UTC 2005

Hi Brian,

Thanks for the response and the explanation. I'd been trying to make sure I
knew what I was talking about before I sent anything out to the list but I
was sure the problem was on my end since no one else had described what I
was seeing since the release.

On 11/5/05, Brian Warner <warner-buildbot at lothar.com> wrote:
>
> > In 0.6.6 I had a periodicBuildTime set on all of my builds for about an
> hour
> > and a half. Since each of the builds generally took about two and a half
> > hours and I wanted essentially continuous builds, this seemed to be a
> > reasonable compromise between having the periodic build timer almost
> always
> > finding another build in progress and having it wait too long at the end
> of
> > one build before starting the next.
>
> Yeah, the new Schedulers are submitting BuildRequests, which get queued
> when
> the target Builder is busy. These BuildRequests are specifically requests
> to
> build the latest source code (i.e. in their SourceStamp, .revision=None),
> which means they can be merged. This means that if the build takes an
> entire
> day, and you have 20 new BuildRequests stacked up when it finally
> completes,
> all of those BuildRequests can be merged together and handled with a
> single
> build. This is how we avoid generating more work than we can ever hope to
> actually finish.

I really like this behaviour. When I first noticed the uncaught exception
I'd started putting in code to do something like this but eventually decided
I'd rather just toss the extra request since I knew it was a periodic build
request and another one would be along before too long. That's mostly why I
didn't bring it up on the list before, my solution was a quick hack that
worked well enough for my environment but wasn't something I'd consider good
enough to share.

> Turns out there was an uncaught exception (somewhere, I can find it if
> > anyone's interested but I don't have the code on hand right now) that
> would
> > cause the periodic build timer to not reset itself if it expired and
> > attempted to start another build while one was already in progress. Once
> I
> > fixed that everything was humming along nicely.
>
> If this is still a problem in 0.7.0, let me know. The Periodic class isn't
> supposed to have any idea whether the BuildRequest it submitted has
> completed
> or not.

Will do.

> > The failing case generates this in the log:
> [...]
> I think I need to change the way config-file reading works to not do a
> startService to anything until we've finished handling the whole file.
> I've
> run into a similar bug with things like Waterfall (involving two things
> trying to listen on the same TCP port, where really it's a new, temporary
> instance coming from the config file versus the existing instance from the
> last time we read the config file).

Hmm, okay, that actually sounds like it could be the cause of another
problem I'd been seeing that I was at a loss to explain. The build
environment I'm running in has a large farm of identical build machines (for
now, in the next three or four months that's going to change, though, with
some very different build slaves appearing) where I'm cross-compiling for
fifty to seventy different targets at any given time. When the builds all
line up, or even come close to each other, it tends to kill our CVS server
so I've set up an environment where one build is responsible for doing
snapshots of CVS and letting all the other builds look at the shared source.
My initial idea had been to force all the actual build-builds to be
dependant on the CVS-build so none of them would start while a
checkout/snapshot was in progress, but that dependency didn't seem to be
respected. I'd chalked it up to the way my build stanzas were being
processed from my configuration file but it might also have been related to
the time the dependencies were being examined.

I've now abandoned that option since it didn't seem like it was going to
work in my environment anyway (imagine the horror that could result from a
CVS update happening in the middle of a build) and I'm kind of back to the
architecture stage, thinking I'd like something like the new locks but maybe
with more semaphore-like semantics (post/wait/trywait), but I'm still
thinking I'm going to run into starvation problems. Definitely more thought
is required on my part.

A quick-hack fix would be to change BuildMaster.loadConfig to set up the
> builders before it calls loadConfig_Schedulers, but really that's just
> shifting the problem around and asking for it to cause a bug somewhere
> else.
>
> Hrm. I have to think about this for a bit.

I can try that out and let you know if that breaks anything else. Let me
know if there's anything else I can try that might help out.

--
-Joe.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://buildbot.net/pipermail/devel/attachments/20051107/3594e9b8/attachment.html>