[users at bb.net] And yet even more anecdotes...

Neil Gilmore ngilmore at grammatech.com
Wed Aug 15 15:32:13 UTC 2018


Hi everyone,

Mostly, our system has been running well. This summer, we did have an 
intern who we gave the job of both updating our masters and making our 
UI changes work on the latest version. So we're currently running 1.2.0 
for our masters. We're further behind on the workers, because updating 
those is painful for us.

We did run into an odd problem over the weekend. As you might remember, 
we divide our builders essentially two ways. The smaller set produces 
our installers. The larger set uses those installers to run tests. Our 
installer builders use locks to make sure that only one build is running 
at a time. We can't use the usual mechanism because we also have one 
builder for monitoring that needs to run concurrently with the 'active' 
build.

We changed from using one branch to another. This was done after the old 
branch's builds started. Because we construct builder names from the 
branch names, this can mean that we create new builders, or resurrect 
older builders (which is what happened in this case). Naturally, we no 
longer see the old  builders by default.

Every one of those builders got stuck acquiring locks. We do see this 
problem from time to time, but it's usually a single host's builds.

Cancelling the build won't solve the problem. The next build will also 
get stuck. Restarting the worker doesn't help, nor does a 
reconfiguration. Restarting the master will solve the problem, but 
that's pretty drastic, and we tend to lose a lot of work when we do that.

If this happens to you, and you need to fix it, here's what we do. We 
use the manhole in twistd. And we do these:

foo = master.botmaster.namedServices['<name of stuck 
builder>'].building[0].locks[0][0]
foo.release(foo.owners[0][0], foo.owners[0][1])

The index of 'building' might not always be 0, but almost always is.

I remember some discussion of whether newer versions of twistd still had 
the manhole. Because we need to do this regularly (though thankfully not 
frequently), I was dreading updating twistd. But it looks like it survives.

Thanks for your time reading this!

Neil Gilmore
grammatech.com


More information about the users mailing list