[Buildbot-devel] removing running builders

Elmir Jagudin elmir.jagudin at axis.com
Thu Mar 13 06:37:15 UTC 2014


Hi

I wrote a bug report about this issue some time ago: 
http://trac.buildbot.net/ticket/2701

Here is my attempt at fixing it: 
https://github.com/buildbot/buildbot/pull/1093

My understanding of the code that deals with slave-master communication 
is not to great, so I'm not sure if I'm doing the right thing. I would 
appreciate any review comments.

Regards,
Elmir

On 02/13/2014 03:43 PM, Elmir Jagudin wrote:
> Hi
>
> I wonder what is supposed to happen when you remove/rename a builder
> that is currently running on a slave?
>
> Currently, it seems that slave tries to abort the build, but fails. It
> looks like the build is listed as running on that slave indefinitly. If
> the slave is configured with max_builds=1, no new builds will be
> dispatched to the slave.
>
> Below are contents of slave's twisted.log, when a builder 'hellox' is
> renamed to 'helloy' in master.cfg and 'buildbot reconfig' is run:
>
> 2014-02-13 15:34:10+0100 [Broker,client] stopCommand: halting current
> command <buildslave.commands.shell.SlaveShellCommand instance at 0xa4111cc>
> 2014-02-13 15:34:10+0100 [Broker,client] command interrupted, attempting
> to kill
> 2014-02-13 15:34:10+0100 [Broker,client] trying to kill process group 20112
> 2014-02-13 15:34:10+0100 [Broker,client]  signal 9 sent successfully
> 2014-02-13 15:34:10+0100 [Broker,client] I have a leftover directory
> 'hellox' that is not being used by the buildmaster: you can delete it now
> 2014-02-13 15:34:10+0100 [-] command finished with signal 9, exit code
> None, elapsedTime: 9.045442
> 2014-02-13 15:34:10+0100 [-] would sendStatus but not .running
> 2014-02-13 15:34:10+0100 [-] SlaveBuilder.commandComplete None
> 2014-02-13 15:34:10+0100 [-]  but we weren't running, quitting silently
> 2014-02-13 15:34:10+0100 [Broker,client]
> SlaveBuilder.remote_print(helloy): message from master: attached
>
> It is possible to restore the slave with 'buildslave restart', however
> following is printed in master's twisted.log when the slave reconnects:
>
> 2014-02-13 15:37:55+0100 [Broker,0,127.0.0.1]
> BuildSlave.detached(example-slave)
> 2014-02-13 15:37:55+0100 [Broker,0,127.0.0.1] releaseLocks(<BuildSlave
> 'example-slave'>): []
> 2014-02-13 15:37:55+0100 [Broker,0,127.0.0.1] Buildslave example-slave
> detached from helloy
> 2014-02-13 15:37:55+0100 [Broker,0,127.0.0.1] <Build hellox>.lostRemote
> 2014-02-13 15:37:55+0100 [Broker,0,127.0.0.1]  stopping currentStep
> <buildbot.steps.shell.ShellCommand object at 0xb2e2aec>
> 2014-02-13 15:37:55+0100 [Broker,0,127.0.0.1] addCompleteLog(interrupt)
> 2014-02-13 15:37:55+0100 [Broker,0,127.0.0.1] RemoteCommand.interrupt
> <RemoteShellCommand '['sleep', '180']'> [Failure instance: Traceback
> (failure with no frames): <class
> 'twisted.internet.error.ConnectionLost'>: Connection to the other side
> was lost in a non-clean fashion.
>       ]
> 2014-02-13 15:37:55+0100 [Broker,0,127.0.0.1] RemoteCommand.disconnect:
> lost slave
> 2014-02-13 15:37:55+0100 [Broker,0,127.0.0.1]
> releaseLocks(<buildbot.steps.shell.ShellCommand object at 0xb2e2aec>): []
> 2014-02-13 15:37:55+0100 [Broker,0,127.0.0.1]  step 'shell' complete: retry
> 2014-02-13 15:37:55+0100 [Broker,0,127.0.0.1]  <Build hellox>: build
> finished
> 2014-02-13 15:37:55+0100 [Broker,0,127.0.0.1] from a running build; this
> is a serious error - please file a bug at http://buildbot.net
>       Traceback (most recent call last):
>         File "/home/elmir/remblds/src/master/buildbot/process/build.py",
> line 519, in allStepsDone
>           return self.buildFinished(text, self.result)
>         File "/home/elmir/remblds/src/master/buildbot/process/build.py",
> line 558, in buildFinished
>           self.deferred.callback(self)
>         File
> "/home/elmir/remblds/sandbox/local/lib/python2.7/site-packages/twisted/internet/defer.py",
> line 382, in callback
>           self._startRunCallbacks(result)
>         File
> "/home/elmir/remblds/sandbox/local/lib/python2.7/site-packages/twisted/internet/defer.py",
> line 490, in _startRunCallbacks
>           self._runCallbacks()
>       --- <exception caught here> ---
>         File
> "/home/elmir/remblds/sandbox/local/lib/python2.7/site-packages/twisted/internet/defer.py",
> line 577, in _runCallbacks
>           current.result = callback(current.result, *args, **kw)
>         File
> "/home/elmir/remblds/src/master/buildbot/process/builder.py", line 455,
> in buildFinished
>           d = self.master.db.builds.finishBuilds(bids)
>       exceptions.AttributeError: 'NoneType' object has no attribute 'db'
>
> 2014-02-13 15:37:55+0100 [Broker,1,127.0.0.1] slave 'example-slave'
> attaching from IPv4Address(TCP, '127.0.0.1', 54965)
> 2014-02-13 15:37:55+0100 [Broker,1,127.0.0.1] Got slaveinfo from
> 'example-slave'
> 2014-02-13 15:37:55+0100 [Broker,1,127.0.0.1] Starting buildslave
> keepalive timer for 'example-slave'
> 2014-02-13 15:37:55+0100 [Broker,1,127.0.0.1] bot attached
> 2014-02-13 15:37:57+0100 [Broker,1,127.0.0.1] Buildslave example-slave
> attached to helloy
>
>
> Regards,
> Elmir





More information about the devel mailing list