[Buildbot-commits] [Buildbot] #2427: Master requires restart sometimes if slave connection lost during graceful shutdown

Buildbot nobody at buildbot.net
Sun Jan 20 14:50:55 UTC 2013


#2427: Master requires restart sometimes if slave connection lost during graceful
shutdown
-------------------+--------------------
Reporter:  dank    |       Owner:
    Type:  defect  |      Status:  new
Priority:  minor   |   Milestone:  0.8.8
 Version:  0.8.7   |  Resolution:
Keywords:          |
-------------------+--------------------

Comment (by dustin):

 And right before that:
 {{{
 2013-01-19 21:56:30-0800 [Broker,10,10.0.3.194] _sBF 20
 2013-01-19 21:56:30-0800 [Broker,10,10.0.3.194] _sBF 21
 2013-01-19 21:56:31-0800 [Broker,10,10.0.3.194]
 BuildSlave.detached(pyflakes-i7-ubu1004)
 2013-01-19 21:56:31-0800 [Broker,10,10.0.3.194] releaseLocks(<BuildSlave
 'pyflakes-i7-ubu1004'>): []
 2013-01-19 21:56:31-0800 [Broker,10,10.0.3.194] Buildslave
 pyflakes-i7-ubu1004 detached from pyflakes-ubu1004-master
 2013-01-19 21:56:31-0800 [Broker,11,10.0.3.194] slave
 'pyflakes-i7-ubu1004' attaching from IPv4Address(TCP, '10.0.3.194', 47321)
 2013-01-19 21:56:31-0800 [Broker,11,10.0.3.194] Starting buildslave
 keepalive timer for 'pyflakes-i7-ubu1004'
 2013-01-19 21:56:31-0800 [Broker,11,10.0.3.194] Got slaveinfo from
 'pyflakes-i7-ubu1004'
 2013-01-19 21:56:31-0800 [Broker,11,10.0.3.194] bot attached
 2013-01-19 21:56:31-0800 [Broker,11,10.0.3.194] Buildslave
 pyflakes-i7-ubu1004 attached to pyflakes-ubu1004-master
 2013-01-19 21:56:32-0800 [-] _sBF 22
 2013-01-19 21:56:32-0800 [-] _sBF 23
 2013-01-19 21:56:32-0800 [-] _sBF 24
 }}}

 So the slave disconnects and reconnects while the master is recording the
 builds in the db.

 The code where the exception occurs is protected by an `addErrback`:
 {{{
 #! python
         d = build.startBuild(bs, self.expectations, slavebuilder)
         d.addCallback(self.buildFinished, slavebuilder, bids)
         # this shouldn't happen. if it does, the slave will be wedged
         d.addErrback(log.err)
 }}}

 but that doesn't help for a few reasons:

  1. `startBuild` is raising an exception, not returning a Deferred with a
 failure
  1. Logging the error wouldn't help anyway - as the comment says, the
 slave would still be wedged.

 Now, to think about how to solve this..

-- 
Ticket URL: <http://trac.buildbot.net/ticket/2427#comment:7>
Buildbot <http://buildbot.net/>
Buildbot: build/test automation


More information about the Commits mailing list