[users at bb.net] And even more multi-master anecdotes.

Neil Gilmore ngilmore at grammatech.com
Wed Sep 6 15:52:36 UTC 2017

Hi everyone,

The last time we were here, my boss has just added something to keep the 
old builds in the database down to 100K. That's worked pretty well for 
our UI master, as in I haven't had to restart it since. Naturally, one 
of our other masters is acting up.

This master is the one that probably does the most work and runs most of 
the builds. In the past, it's been prone to losing its wamp connection. 
When this happens, we get nothing in the logs. The symptom is that none 
of the workers for this master appear in the UI, and its builders don't 
appear in the builders page. So our users conclude that the master isn't 
running (which isn't true). You can still see the builds running on the 
front page, and get to the builder's page from there, but if you needed 
to force a build on a builder that isn't currently building, you're out 
of luck. Unless, of course, you go through the REST API and find the 
builder's number.

Because of the previous problem, I've had a session running top for a 
couple weeks, and I may have more data. Eventually, that master shows a 
100% (or more) CPU usage for some minutes (or maybe an hour or more). My 
theory is that the wamp connection isn't serviced during that time, and 
disconnects. When things settle down, the connection is already gone, 
and isn't reconnected. That's my current situation. The CPU was spiked 
when I looked, the log wasn't getting new messages, and the UI wasn't 
showing the workers. When the CPU came back down, the log resumed, but 
the master's workers aren't in the UI.

As one might expect, builds do not proceed well when the master is 
spiking the CPU.

During one of these times when the CPU is spiked, the log only gets new 
messages every several minutes (instead of a pretty close to continuous 
flow). I attached gdb to the master, figuring I couldn't make things 
worse. Unfortunately, I don't think that system is quite set up for 
debugging. For example, py-bt, while it did run, gave back nothing 
useful. A straight bt gave the usual string of python fame evaluations, 
etc. I didn't have time to go further then. But I did note that the 
backtrace showed something similar to what we saw when processing 
millions of builds. That is that every time I broke to see where I was, 
I was down in some regex stuff.

Neil Gilmore

More information about the users mailing list