[Buildbot-commits] [SPAM] [Buildbot] #887: Unclean slave shutdown can lead to zombie slave references in master

Buildbot buildbot-devel at lists.sourceforge.net
Wed Jun 9 15:18:38 UTC 2010


#887: Unclean slave shutdown can lead to zombie slave references in master
-------------------+--------------------------------------------------------
Reporter:  catlee  |       Owner:     
    Type:  defect  |      Status:  new
Priority:  minor   |   Milestone:     
 Version:  0.8.0   |    Keywords:     
-------------------+--------------------------------------------------------
 One of our slaves (10.2.90.75) was shut down abruptly (the VM was turned
 off).  The exceptions below were generated as a result.

 {{{
 Exception in /builds/buildbot/builder_master/twistd.log.20:
 2010-06-08 15:22:07-0700 [Broker,1405,10.2.90.75] Unhandled Error
         Traceback (most recent call last):
         Failure: twisted.spread.pb.PBConnectionLost: [Failure instance:
 Traceback (failure with no frames): <class
 'twisted.internet.error.ConnectionDone'>: Connection was closed cleanly.
         ]

 --------------------------------------------------------------------------------
 Exception in /builds/buildbot/builder_master/twistd.log.20:
 2010-06-08 15:22:07-0700 [Broker,1405,10.2.90.75] Unhandled Error
         Traceback (most recent call last):
         Failure: twisted.spread.pb.PBConnectionLost: [Failure instance:
 Traceback (failure with no frames): <class
 'twisted.internet.error.ConnectionDone'>: Connection was closed cleanly.
         ]

 --------------------------------------------------------------------------------
 Exception in /builds/buildbot/builder_master/twistd.log.20:
 2010-06-08 15:22:07-0700 [Broker,1405,10.2.90.75] Unhandled Error
         Traceback (most recent call last):
           File "/tools/buildbot/lib/python2.6/site-
 packages/Twisted-9.0.0-py2.6-linux-i686.egg/twisted/internet/defer.py",
 line 307, in _startRunCallbacks
             self._runCallbacks()
           File "/tools/buildbot/lib/python2.6/site-
 packages/Twisted-9.0.0-py2.6-linux-i686.egg/twisted/internet/defer.py",
 line 323, in _runCallbacks
             self.result = callback(self.result, *args, **kw)
           File "/tools/buildbot/lib/python2.6/site-
 packages/Twisted-9.0.0-py2.6-linux-i686.egg/twisted/internet/defer.py",
 line 284, in _continue
             self.unpause()
           File "/tools/buildbot/lib/python2.6/site-
 packages/Twisted-9.0.0-py2.6-linux-i686.egg/twisted/internet/defer.py",
 line 280, in unpause
             self._runCallbacks()
         --- <exception caught here> ---
           File "/tools/buildbot/lib/python2.6/site-
 packages/Twisted-9.0.0-py2.6-linux-i686.egg/twisted/internet/defer.py",
 line 323, in _runCallbacks
             self.result = callback(self.result, *args, **kw)
           File "/tools/buildbot/lib/python2.6/site-
 packages/buildbot-0.8.0-py2.6.egg/buildbot/buildslave.py", line 247, in
 _accept_slave
             return self.updateSlave()
           File "/tools/buildbot/lib/python2.6/site-
 packages/buildbot-0.8.0-py2.6.egg/buildbot/buildslave.py", line 141, in
 updateSlave
             return self.sendBuilderList()
           File "/tools/buildbot/lib/python2.6/site-
 packages/buildbot-0.8.0-py2.6.egg/buildbot/buildslave.py", line 428, in
 sendBuilderList
             d = AbstractBuildSlave.sendBuilderList(self)
           File "/tools/buildbot/lib/python2.6/site-
 packages/buildbot-0.8.0-py2.6.egg/buildbot/buildslave.py", line 329, in
 sendBuilderList
             d = self.slave.callRemote("setBuilderList", blist)
           File "/tools/buildbot/lib/python2.6/site-
 packages/Twisted-9.0.0-py2.6-linux-i686.egg/twisted/spread/pb.py", line
 328, in callRemote
             _name, args, kw)
           File "/tools/buildbot/lib/python2.6/site-
 packages/Twisted-9.0.0-py2.6-linux-i686.egg/twisted/spread/pb.py", line
 807, in _sendMessage
             raise DeadReferenceError("Calling Stale Broker")
         twisted.spread.pb.DeadReferenceError: Calling Stale Broker

 --------------------------------------------------------------------------------
 Exception in /builds/buildbot/builder_master/twistd.log.20:
 2010-06-08 15:22:09-0700 [Broker,1406,10.2.90.75] Unhandled Error
         Traceback (most recent call last):
           File "/tools/buildbot/lib/python2.6/site-
 packages/Twisted-9.0.0-py2.6-linux-i686.egg/twisted/internet/defer.py",
 line 190, in addCallback
             callbackKeywords=kw)
           File "/tools/buildbot/lib/python2.6/site-
 packages/Twisted-9.0.0-py2.6-linux-i686.egg/twisted/internet/defer.py",
 line 181, in addCallbacks
             self._runCallbacks()
           File "/tools/buildbot/lib/python2.6/site-
 packages/Twisted-9.0.0-py2.6-linux-i686.egg/twisted/internet/defer.py",
 line 323, in _runCallbacks
             self.result = callback(self.result, *args, **kw)
           File "/tools/buildbot/lib/python2.6/site-
 packages/buildbot-0.8.0-py2.6.egg/buildbot/master.py", line 375, in
 requestAvatar
             d = defer.maybeDeferred(p.attached, mind)
         --- <exception caught here> ---
           File "/tools/buildbot/lib/python2.6/site-
 packages/Twisted-9.0.0-py2.6-linux-i686.egg/twisted/internet/defer.py",
 line 102, in maybeDeferred
             result = f(*args, **kw)
           File "/tools/buildbot/lib/python2.6/site-
 packages/buildbot-0.8.0-py2.6.egg/buildbot/buildslave.py", line 171, in
 attached
             d = self.disconnect()
           File "/tools/buildbot/lib/python2.6/site-
 packages/buildbot-0.8.0-py2.6.egg/buildbot/buildslave.py", line 288, in
 disconnect
             return self._disconnect(self.slave)
           File "/tools/buildbot/lib/python2.6/site-
 packages/buildbot-0.8.0-py2.6.egg/buildbot/buildslave.py", line 303, in
 _disconnect
             slave.notifyOnDisconnect(_disconnected)
           File "/tools/buildbot/lib/python2.6/site-
 packages/Twisted-9.0.0-py2.6-linux-i686.egg/twisted/spread/pb.py", line
 285, in notifyOnDisconnect
             self.broker.notifyOnDisconnect(self._disconnected)
           File "/tools/buildbot/lib/python2.6/site-
 packages/Twisted-9.0.0-py2.6-linux-i686.egg/twisted/spread/pb.py", line
 609, in notifyOnDisconnect
             self.disconnects.append(notifier)
         exceptions.AttributeError: 'NoneType' object has no attribute
 'append'
 }}}

 When the slave came back up, it was unable to connect, these messages
 appeared in the master's log:
 {{{
 2010-06-09 07:14:14-0700 [Broker,1521,10.2.90.75] duplicate slave
 moz2-linux64-slave09 replacing old one
 2010-06-09 07:14:14-0700 [Broker,1521,10.2.90.75] old slave was connected
 from IPv4Address(TCP, '10.2.90.75', 44417)
 2010-06-09 07:14:14-0700 [Broker,1521,10.2.90.75] new slave is from
 IPv4Address(TCP, '10.2.90.75', 55878)
 2010-06-09 07:14:14-0700 [Broker,1521,10.2.90.75] disconnecting old slave
 moz2-linux64-slave09 now
 2010-06-09 07:14:14-0700 [Broker,1521,10.2.90.75] waiting for slave to
 finish disconnecting
 }}}

 netstat showed several ESTABLISHED connections to this IP.

 via a manhole I was able to determine that:
 - BuildSlave.slave_status.connected == True
 - BuildSlave.slave == twisted.spread.pb.RemoteReference instance
 - BuildSlave.slave.perspective is None
 - the slave does not appear in any Builder.slaves list.

 I fixed it by setting BuildSlave.slave_status.connected to False, and
 BuildSlave.slave to None, and then reconnecting the slave.

-- 
Ticket URL: <http://buildbot.net/trac/ticket/887>
Buildbot <http://buildbot.net/>
Buildbot: build/test automation


More information about the Commits mailing list