[users at bb.net] And even more multi-master anecdotes.
ngilmore at grammatech.com
Wed Sep 6 15:52:36 UTC 2017
The last time we were here, my boss has just added something to keep the
old builds in the database down to 100K. That's worked pretty well for
our UI master, as in I haven't had to restart it since. Naturally, one
of our other masters is acting up.
This master is the one that probably does the most work and runs most of
the builds. In the past, it's been prone to losing its wamp connection.
When this happens, we get nothing in the logs. The symptom is that none
of the workers for this master appear in the UI, and its builders don't
appear in the builders page. So our users conclude that the master isn't
running (which isn't true). You can still see the builds running on the
front page, and get to the builder's page from there, but if you needed
to force a build on a builder that isn't currently building, you're out
of luck. Unless, of course, you go through the REST API and find the
Because of the previous problem, I've had a session running top for a
couple weeks, and I may have more data. Eventually, that master shows a
100% (or more) CPU usage for some minutes (or maybe an hour or more). My
theory is that the wamp connection isn't serviced during that time, and
disconnects. When things settle down, the connection is already gone,
and isn't reconnected. That's my current situation. The CPU was spiked
when I looked, the log wasn't getting new messages, and the UI wasn't
showing the workers. When the CPU came back down, the log resumed, but
the master's workers aren't in the UI.
As one might expect, builds do not proceed well when the master is
spiking the CPU.
During one of these times when the CPU is spiked, the log only gets new
messages every several minutes (instead of a pretty close to continuous
flow). I attached gdb to the master, figuring I couldn't make things
worse. Unfortunately, I don't think that system is quite set up for
debugging. For example, py-bt, while it did run, gave back nothing
useful. A straight bt gave the usual string of python fame evaluations,
etc. I didn't have time to go further then. But I did note that the
backtrace showed something similar to what we saw when processing
millions of builds. That is that every time I broke to see where I was,
I was down in some regex stuff.
More information about the users