[users at bb.net] A summary of first month issues using Buildbot 2.0
Yngve N. Pettersen
yngve at vivaldi.com
Fri Mar 15 01:29:30 UTC 2019
Hi,
About a month ago we transferred our build system from the old chromium
developed buildbot system to one based on Buildbot 2.0. In that period we
have had a couple of major issues that I thought I'd summarize:
* We have had two crashes of the buildbot master process. I do not know
what causes the crashes, and the twisted.log does not contain any
information about what happened, so my guess is that it is either the
Ubuntu 18 Python 3.6 that crashed, or the Twisted/buildbot scripts did so
in a non-logging fashion.
* We have had at least two cases where the master lost its connection to
the Database server, and did not recover, and restarting the master was
the only option. The probable commonality with these cases is that it
seems to have happened when using the reconfigure/sighup option to update
the buildbot configuration. In at least one case the log seemed to include
an exception regarding the Database connection (which is a remote
postgresql server)
* We have had a couple of cases where the network connection between the
master and some of the workers have been interrupted. In the major case,
this lead to having to restart the worker instances on all the affected
workers. This was the topic of an email to this list a few weeks ago. In
this case logs show that the workers correctly connected, but that the
master then failed (due to an exception) to correctly register the worker,
and failed to cut the connection to the worker (so that it could try to
reconnect again) either when the registration process failed, or later
when checking open connections (if it does), and apparently also responded
to pings from the worker. It also did not detect that a worker was not
really connected when it tried to ping it when trying to assign it a job.
This reconnect issue is such a major problem and hassle that, when we did
a restart of that network connection, we shut down the *master* instance
while taking down the network connection, and restarting it afterwards.
--
Sincerely,
Yngve N. Pettersen
Vivaldi Technologies AS
More information about the users
mailing list