[users at bb.net] The latest anecdotes from a multi-master setup.

Neil Gilmore ngilmore at grammatech.com
Thu Jun 29 15:20:51 UTC 2017


Hi everyone,

We're still using 4 masters on 0.9.3, with most workers at 0.9.1 (and 
we're not upgrading any time soon), with a few minor modifications, such 
as Pierre's code to allow schedulers to reconfig, and ignoring revisions 
when collapsing builds.

In the last week or so...

One of our masters apparently lost its WAMP connection twice. Nothing 
shows up in the twistd.log, and our crossbar log doesn't show much of 
anything. The symptom is that the master's builders don't show in the 
UI. Reconfig failes to correct the problem. Only stopping and restarting 
restores functionality.

We had a case of 2 builders both acquiring locks (the same lock). You 
may recall that we use worker locks to control builds. We don't use the 
worker property for this, as we have a builder on nearly every worker 
that needs to run. We use the lock to allow only one of the other 
builders to run at a time. Sometimes I've been able to correct his 
successfully through the manhole, though getting to the point of calling 
release() on the lock is unpleasant. This time either it didn't work, or 
I messed it up. Again, restarting the master took care of the problem.

As I indicated in a reply on the dev list, we had a master that had to 
be restarted. It's log indicated that reconfig had been continuing for 
~17K seconds.

This morning, I had my attention called to a builder that had apparently 
reversed in time. It's current build used a current SVN revision, as did 
the previous. The one before that, however, was using a very old 
revision. Builds previous to that appeared to use correct revisions.

Looking at the debug information, the build was submitted on May 16, but 
was not claimed until June 29. I suspect that my restarting of its 
master yesterday finally got it to be claimed. In general, there's times 
I've noticed queued builds taking hours to be started, though that's 
different than being claimed in the first place.

And that's it for today.

Neil Gilmore
grammatech.com


More information about the users mailing list