<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body text="#000000" bgcolor="#FFFFFF">
What we have is a builder that ssh's into the machine the worker is
running on, cd's into the worker's directory, and looks for
twistd.pid, and restarts based on whether it's present and whether
it appears in the process list and so on.<br>
<br>
One huge benefit of this over cron jobs is that we can construct the
list of workers inside of master.cfg. Very useful when our
master.cfg changes multiple times in a day. The only cron job we run
is one for the masters and the single worker that runs the builder
that checks and starts all the others.<br>
<br>
And at least in our, older system, buildbot-worker start will
terminate, but it takes 10 seconds or more before spitting out the
following:<br>
The worker took more than 10 seconds to start and/or connect to the
buildmaster,<br>
so we were unable to confirm that it started and connected
correctly. Please<br>
'tail twistd.log' and look for a line that says 'message from
master: attached'<br>
to verify correct startup. If you see a bunch of messages like 'will
retry in 6<br>
seconds', your worker might not have the correct hostname or
portnumber for the<br>
buildmaster, or the buildmaster might not be running. If you see
messages like 'Failure: twisted.cred.error.UnauthorizedLogin'<br>
then your worker might be using the wrong botname or password.
Please correct<br>
these problems and then restart the worker.<br>
<br>
Neil Gilmore<br>
grammatech.com<br>
<br>
<div class="moz-cite-prefix">On 2/1/2018 12:19 PM, Chris Spencer
wrote:<br>
</div>
<blockquote type="cite"
cite="mid:CANe40gKMPgcBn4WhjkRLGg7PM4qq2kWn1ay5N0iOpUJXKc0fDg@mail.gmail.com">
<div dir="ltr">
<div>
<div>
<div>
<div>
<div>
<div>I'm having a problem with workers randomly
stopping. From the worker's logs, I'm seeing:<br>
<br>
2018-01-26 01:22:33-0500 [-] sending app-level
keepalive<br>
2018-01-26 01:32:33-0500 [-] sending app-level
keepalive<br>
2018-01-26 01:42:33-0500 [-] sending app-level
keepalive<br>
2018-01-26 01:52:33-0500 [-] sending app-level
keepalive<br>
2018-01-26 02:00:00-0500 [-] Received SIGTERM,
shutting down.<br>
2018-01-26 02:00:00-0500 [HangCheckProtocol,client]
Lost connection to <a
href="http://10.159.135.58:9989"
moz-do-not-send="true">10.159.135.58:9989</a><br>
2018-01-26 02:00:00-0500 [-] Stopping factory
<buildbot_worker.pb.BotFactory instance at
0x7f50af441950><br>
2018-01-26 02:00:00-0500 [-] Main loop terminated.<br>
2018-01-26 02:00:00-0500 [-] Server Shut Down.<br>
<br>
</div>
However, my master's still running, as well as other
workers, so I don't know why a single worker would get
receive a sigkill, and nothing else.<br>
<br>
</div>
To work around this issue, I want to create a cronjob
that periodically checks to see if the worker has
stopped and restart it. Looking at the docs for
buildbot-worker at <a
href="http://docs.buildbot.net/latest/manual/cmdline.html"
moz-do-not-send="true">http://docs.buildbot.net/latest/manual/cmdline.html</a>,
I see options to start, stop and restart, but there's no
option to check status.<br>
<br>
</div>
How do I check to see if a specific worker is running, so
I know to restart it?<br>
<br>
</div>
I tried just re-running `buildbot-worker start workerN` but
that hangs if that worker is already running, showing the
error message:<br>
<br>
Following twistd.log until startup finished..<br>
Another twistd server is running, PID 13758<br>
<br>
This could either be a previously started instance of
your application or a<br>
different application entirely. To start a new one,
either run it in some other<br>
directory, or use the --pidfile and --logfile parameters
to avoid clashes.<br>
<br>
</div>
<div>Why does that not simply exit after showing the error
message? I had to send ctrl-c to make it return.<br>
</div>
<div><br>
</div>
And obviously I don't want to run `buildbot-worker restart
workerN` because that will kill the current worker if it's
already running, interrupting the current build.<br>
<br>
</div>
I can check for the existence of
<buildbot_dir>/workerN/twistd.pid, but that feels a little
hacky and likely to break if Buildbot changes how it tracks
worker pids.<br>
</div>
<br>
<fieldset class="mimeAttachmentHeader"></fieldset>
<br>
<pre wrap="">_______________________________________________
users mailing list
<a class="moz-txt-link-abbreviated" href="mailto:users@buildbot.net">users@buildbot.net</a>
<a class="moz-txt-link-freetext" href="https://lists.buildbot.net/mailman/listinfo/users">https://lists.buildbot.net/mailman/listinfo/users</a></pre>
</blockquote>
<br>
</body>
</html>