[users at bb.net] And yet even more anecdotes...
ngilmore at grammatech.com
Wed Aug 15 15:32:13 UTC 2018
Mostly, our system has been running well. This summer, we did have an
intern who we gave the job of both updating our masters and making our
UI changes work on the latest version. So we're currently running 1.2.0
for our masters. We're further behind on the workers, because updating
those is painful for us.
We did run into an odd problem over the weekend. As you might remember,
we divide our builders essentially two ways. The smaller set produces
our installers. The larger set uses those installers to run tests. Our
installer builders use locks to make sure that only one build is running
at a time. We can't use the usual mechanism because we also have one
builder for monitoring that needs to run concurrently with the 'active'
We changed from using one branch to another. This was done after the old
branch's builds started. Because we construct builder names from the
branch names, this can mean that we create new builders, or resurrect
older builders (which is what happened in this case). Naturally, we no
longer see the old builders by default.
Every one of those builders got stuck acquiring locks. We do see this
problem from time to time, but it's usually a single host's builds.
Cancelling the build won't solve the problem. The next build will also
get stuck. Restarting the worker doesn't help, nor does a
reconfiguration. Restarting the master will solve the problem, but
that's pretty drastic, and we tend to lose a lot of work when we do that.
If this happens to you, and you need to fix it, here's what we do. We
use the manhole in twistd. And we do these:
foo = master.botmaster.namedServices['<name of stuck
The index of 'building' might not always be 0, but almost always is.
I remember some discussion of whether newer versions of twistd still had
the manhole. Because we need to do this regularly (though thankfully not
frequently), I was dreading updating twistd. But it looks like it survives.
Thanks for your time reading this!
More information about the users