[users at bb.net] Anotheranecdote from the multi-master trenches.

Neil Gilmore ngilmore at grammatech.com
Thu Dec 8 16:45:15 UTC 2016


Hi everyone.

First, a bit of good news. My current top priority is to make the 
schedulers reconfigurable. Not conceptually difficult, but I wasn't 
well-versed in Python argument passing (which figures prominently in 
this), so I've had a couple aborted tries on that score. I think I've 
got all that sorted out for now. It's just biting us way too badly to 
not be able to reconfigure schedulers.

Now, the anecdote. As you may remember, we're running 4 masters. 1 just 
has the UI and force schedulers. 1 has our overall logging system. The 
other 2 are split between producing builds, and consuming them for tests.

Sometime between when I left yesterday and when the test lead looked 
this morning, the UI stopped displaying the builders for the producer 
and consumer masters. Looking at all the masters, they were running, and 
I didn't immediately see anything suspicious in the logs. Looking at the 
data api, I could see all the builders and workers. The workers all 
showed connected_to being valid, but only the logging workers showed 
anything in configured_on. I restarted our UI master and that didn't 
help. Restarting the producer and consumer seems to have solved the 
problem. I can see the builders in the UI, and looking at the workers in 
the data API, I see that most appear to have configured_on set. I have 
no idea what actually happened. My wild conjecture is that the 
inter-master communication got screwed up somehow. Either that or they 
lost connection to the database (less likely, I think. Postgres is 
pretty stable that way.).

Neil Gilmore
grammatech.com


More information about the users mailing list