[users at bb.net] How to Check Worker Status?

Chris Spencer chrisspen at gmail.com
Thu Feb 1 18:19:43 UTC 2018


I'm having a problem with workers randomly stopping. From the worker's
logs, I'm seeing:

2018-01-26 01:22:33-0500 [-] sending app-level keepalive
2018-01-26 01:32:33-0500 [-] sending app-level keepalive
2018-01-26 01:42:33-0500 [-] sending app-level keepalive
2018-01-26 01:52:33-0500 [-] sending app-level keepalive
2018-01-26 02:00:00-0500 [-] Received SIGTERM, shutting down.
2018-01-26 02:00:00-0500 [HangCheckProtocol,client] Lost connection to
10.159.135.58:9989
2018-01-26 02:00:00-0500 [-] Stopping factory
<buildbot_worker.pb.BotFactory instance at 0x7f50af441950>
2018-01-26 02:00:00-0500 [-] Main loop terminated.
2018-01-26 02:00:00-0500 [-] Server Shut Down.

However, my master's still running, as well as other workers, so I don't
know why a single worker would get receive a sigkill, and nothing else.

To work around this issue, I want to create a cronjob that periodically
checks to see if the worker has stopped and restart it. Looking at the docs
for buildbot-worker at http://docs.buildbot.net/latest/manual/cmdline.html,
I see options to start, stop and restart, but there's no option to check
status.

How do I check to see if a specific worker is running, so I know to restart
it?

I tried just re-running `buildbot-worker start workerN` but that hangs if
that worker is already running, showing the error message:

    Following twistd.log until startup finished..
    Another twistd server is running, PID 13758

    This could either be a previously started instance of your application
or a
    different application entirely. To start a new one, either run it in
some other
    directory, or use the --pidfile and --logfile parameters to avoid
clashes.

Why does that not simply exit after showing the error message? I had to
send ctrl-c to make it return.

And obviously I don't want to run `buildbot-worker restart workerN` because
that will kill the current worker if it's already running, interrupting the
current build.

I can check for the existence of <buildbot_dir>/workerN/twistd.pid, but
that feels a little hacky and likely to break if Buildbot changes how it
tracks worker pids.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.buildbot.net/pipermail/users/attachments/20180201/2b3ada98/attachment.html>


More information about the users mailing list