[Buildbot-commits] [SPAM] [Buildbot] #887: Unclean slave shutdown can lead to zombie slave references in master
Buildbot
buildbot-devel at lists.sourceforge.net
Wed Jun 9 15:18:38 UTC 2010
#887: Unclean slave shutdown can lead to zombie slave references in master
-------------------+--------------------------------------------------------
Reporter: catlee | Owner:
Type: defect | Status: new
Priority: minor | Milestone:
Version: 0.8.0 | Keywords:
-------------------+--------------------------------------------------------
One of our slaves (10.2.90.75) was shut down abruptly (the VM was turned
off). The exceptions below were generated as a result.
{{{
Exception in /builds/buildbot/builder_master/twistd.log.20:
2010-06-08 15:22:07-0700 [Broker,1405,10.2.90.75] Unhandled Error
Traceback (most recent call last):
Failure: twisted.spread.pb.PBConnectionLost: [Failure instance:
Traceback (failure with no frames): <class
'twisted.internet.error.ConnectionDone'>: Connection was closed cleanly.
]
--------------------------------------------------------------------------------
Exception in /builds/buildbot/builder_master/twistd.log.20:
2010-06-08 15:22:07-0700 [Broker,1405,10.2.90.75] Unhandled Error
Traceback (most recent call last):
Failure: twisted.spread.pb.PBConnectionLost: [Failure instance:
Traceback (failure with no frames): <class
'twisted.internet.error.ConnectionDone'>: Connection was closed cleanly.
]
--------------------------------------------------------------------------------
Exception in /builds/buildbot/builder_master/twistd.log.20:
2010-06-08 15:22:07-0700 [Broker,1405,10.2.90.75] Unhandled Error
Traceback (most recent call last):
File "/tools/buildbot/lib/python2.6/site-
packages/Twisted-9.0.0-py2.6-linux-i686.egg/twisted/internet/defer.py",
line 307, in _startRunCallbacks
self._runCallbacks()
File "/tools/buildbot/lib/python2.6/site-
packages/Twisted-9.0.0-py2.6-linux-i686.egg/twisted/internet/defer.py",
line 323, in _runCallbacks
self.result = callback(self.result, *args, **kw)
File "/tools/buildbot/lib/python2.6/site-
packages/Twisted-9.0.0-py2.6-linux-i686.egg/twisted/internet/defer.py",
line 284, in _continue
self.unpause()
File "/tools/buildbot/lib/python2.6/site-
packages/Twisted-9.0.0-py2.6-linux-i686.egg/twisted/internet/defer.py",
line 280, in unpause
self._runCallbacks()
--- <exception caught here> ---
File "/tools/buildbot/lib/python2.6/site-
packages/Twisted-9.0.0-py2.6-linux-i686.egg/twisted/internet/defer.py",
line 323, in _runCallbacks
self.result = callback(self.result, *args, **kw)
File "/tools/buildbot/lib/python2.6/site-
packages/buildbot-0.8.0-py2.6.egg/buildbot/buildslave.py", line 247, in
_accept_slave
return self.updateSlave()
File "/tools/buildbot/lib/python2.6/site-
packages/buildbot-0.8.0-py2.6.egg/buildbot/buildslave.py", line 141, in
updateSlave
return self.sendBuilderList()
File "/tools/buildbot/lib/python2.6/site-
packages/buildbot-0.8.0-py2.6.egg/buildbot/buildslave.py", line 428, in
sendBuilderList
d = AbstractBuildSlave.sendBuilderList(self)
File "/tools/buildbot/lib/python2.6/site-
packages/buildbot-0.8.0-py2.6.egg/buildbot/buildslave.py", line 329, in
sendBuilderList
d = self.slave.callRemote("setBuilderList", blist)
File "/tools/buildbot/lib/python2.6/site-
packages/Twisted-9.0.0-py2.6-linux-i686.egg/twisted/spread/pb.py", line
328, in callRemote
_name, args, kw)
File "/tools/buildbot/lib/python2.6/site-
packages/Twisted-9.0.0-py2.6-linux-i686.egg/twisted/spread/pb.py", line
807, in _sendMessage
raise DeadReferenceError("Calling Stale Broker")
twisted.spread.pb.DeadReferenceError: Calling Stale Broker
--------------------------------------------------------------------------------
Exception in /builds/buildbot/builder_master/twistd.log.20:
2010-06-08 15:22:09-0700 [Broker,1406,10.2.90.75] Unhandled Error
Traceback (most recent call last):
File "/tools/buildbot/lib/python2.6/site-
packages/Twisted-9.0.0-py2.6-linux-i686.egg/twisted/internet/defer.py",
line 190, in addCallback
callbackKeywords=kw)
File "/tools/buildbot/lib/python2.6/site-
packages/Twisted-9.0.0-py2.6-linux-i686.egg/twisted/internet/defer.py",
line 181, in addCallbacks
self._runCallbacks()
File "/tools/buildbot/lib/python2.6/site-
packages/Twisted-9.0.0-py2.6-linux-i686.egg/twisted/internet/defer.py",
line 323, in _runCallbacks
self.result = callback(self.result, *args, **kw)
File "/tools/buildbot/lib/python2.6/site-
packages/buildbot-0.8.0-py2.6.egg/buildbot/master.py", line 375, in
requestAvatar
d = defer.maybeDeferred(p.attached, mind)
--- <exception caught here> ---
File "/tools/buildbot/lib/python2.6/site-
packages/Twisted-9.0.0-py2.6-linux-i686.egg/twisted/internet/defer.py",
line 102, in maybeDeferred
result = f(*args, **kw)
File "/tools/buildbot/lib/python2.6/site-
packages/buildbot-0.8.0-py2.6.egg/buildbot/buildslave.py", line 171, in
attached
d = self.disconnect()
File "/tools/buildbot/lib/python2.6/site-
packages/buildbot-0.8.0-py2.6.egg/buildbot/buildslave.py", line 288, in
disconnect
return self._disconnect(self.slave)
File "/tools/buildbot/lib/python2.6/site-
packages/buildbot-0.8.0-py2.6.egg/buildbot/buildslave.py", line 303, in
_disconnect
slave.notifyOnDisconnect(_disconnected)
File "/tools/buildbot/lib/python2.6/site-
packages/Twisted-9.0.0-py2.6-linux-i686.egg/twisted/spread/pb.py", line
285, in notifyOnDisconnect
self.broker.notifyOnDisconnect(self._disconnected)
File "/tools/buildbot/lib/python2.6/site-
packages/Twisted-9.0.0-py2.6-linux-i686.egg/twisted/spread/pb.py", line
609, in notifyOnDisconnect
self.disconnects.append(notifier)
exceptions.AttributeError: 'NoneType' object has no attribute
'append'
}}}
When the slave came back up, it was unable to connect, these messages
appeared in the master's log:
{{{
2010-06-09 07:14:14-0700 [Broker,1521,10.2.90.75] duplicate slave
moz2-linux64-slave09 replacing old one
2010-06-09 07:14:14-0700 [Broker,1521,10.2.90.75] old slave was connected
from IPv4Address(TCP, '10.2.90.75', 44417)
2010-06-09 07:14:14-0700 [Broker,1521,10.2.90.75] new slave is from
IPv4Address(TCP, '10.2.90.75', 55878)
2010-06-09 07:14:14-0700 [Broker,1521,10.2.90.75] disconnecting old slave
moz2-linux64-slave09 now
2010-06-09 07:14:14-0700 [Broker,1521,10.2.90.75] waiting for slave to
finish disconnecting
}}}
netstat showed several ESTABLISHED connections to this IP.
via a manhole I was able to determine that:
- BuildSlave.slave_status.connected == True
- BuildSlave.slave == twisted.spread.pb.RemoteReference instance
- BuildSlave.slave.perspective is None
- the slave does not appear in any Builder.slaves list.
I fixed it by setting BuildSlave.slave_status.connected to False, and
BuildSlave.slave to None, and then reconnecting the slave.
--
Ticket URL: <http://buildbot.net/trac/ticket/887>
Buildbot <http://buildbot.net/>
Buildbot: build/test automation
More information about the Commits
mailing list