[users at bb.net] My afternoon with buildbot (mostly rc1).

Neil Gilmore ngilmore at grammatech.com
Wed Sep 21 18:23:46 UTC 2016


Hi everyone,

Another anecdote...

We had a little problem here today, and as a result a few machines were 
rebooted, including one that has a particular worker.

Here we don't start workers using cron, we mostly start them using 
buildbot builds (except for the worker whose builds start the other 
workers). We have a build that logs in to other machines, determines 
whether the worker is running, and starts it if it isn't. It runs every 
hour. The build logs are also useful to monitor which workers are up, as 
I find it a bit quicker to scan that than the builders page.

Unfortunately, the buildbot UI was unresponsive (15 minutes and it 
hadn't given me the builders page). It's last knowledge appeared to be 
that the builds on the rebooted worker were still in progress (even 
though that certainly wasn't true).

I had to kill the master and restart it (that particular worker's builds 
are ones everyone notices). By the time it was fully restarted, and our 
builds to start workers had run, and the rebooted worker's builds were 
running, the 'BuildMaster is running' was down in twistd.log.11.

I'd forced a build to get the rebooted host's worker started. It took 
about 15 minutes for it to start.

And I did notice that upon our startup we do get a lot of unauthorized 
login entries as the workers start attempting to connect as soon as the 
master is up. They go on for several minutes until the master catches up 
with things. I see a lot of buildstep activity going on in between.

At least this time I didn't have to clear the database.

Neil Gilmore
grammatech.com


More information about the users mailing list