[Buildbot-commits] [SPAM] [Buildbot] #664: sporadic connection lost problems
Buildbot
buildbot-devel at lists.sourceforge.net
Mon Dec 21 17:27:43 UTC 2009
#664: sporadic connection lost problems
--------------------+-------------------------------------------------------
Reporter: ddunbar | Owner:
Type: defect | Status: new
Priority: major | Milestone: undecided
Version: | Keywords:
--------------------+-------------------------------------------------------
I have problems with buildbot intermittently dropping slave connections
during a very long running build. I have seen the same problem on slaves
connected over the Internet (particularly Windows machines), but I am now
seeing this in a very closed environment.
The exception I see is this:
--
remoteFailed: [Failure instance: Traceback (failure with no frames):
<class 'twisted.spread.pb.PBConnectionLost'>: [Failure instance: Traceback
(failure with no frames): <class 'twisted.internet.error.ConnectionDone'>:
Connection was closed cleanly.
]
]
--
The slaves twistd.log contains this (mostly unrelated, I believe):
--
2009-12-21 08:02:20-0800 [Broker,client] SlaveBuilder._ackFailed:
SlaveBuilder.sendUpdate
2009-12-21 08:02:20-0800 [Broker,client] SlaveBuilder._ackFailed:
SlaveBuilder.sendUpdate
2009-12-21 08:02:20-0800 [Broker,client] SlaveBuilder._ackFailed:
SlaveBuilder.sendUpdate
2009-12-21 08:02:20-0800 [Broker,client] SlaveBuilder._ackFailed:
SlaveBuilder.sendUpdate
2009-12-21 08:02:20-0800 [Broker,client] SlaveBuilder._ackFailed:
SlaveBuilder.sendUpdate
2009-12-21 08:02:20-0800 [Broker,client] SlaveBuilder._ackFailed:
SlaveBuilder.sendUpdate
2009-12-21 08:02:20-0800 [Broker,client] SlaveBuilder._ackFailed:
SlaveBuilder.sendUpdate
2009-12-21 08:02:20-0800 [Broker,client] lost remote
2009-12-21 08:02:20-0800 [Broker,client] lost remote
2009-12-21 08:02:20-0800 [Broker,client] lost remote
2009-12-21 08:02:20-0800 [Broker,client] lost remote step
2009-12-21 08:02:20-0800 [Broker,client] stopCommand: halting current
command <buildbot.slave.commands.SlaveShellCommand instance at
0x10188c170>
2009-12-21 08:02:20-0800 [Broker,client] command interrupted, killing pid
78808
2009-12-21 08:02:20-0800 [Broker,client] trying os.kill(-pid, 9)
2009-12-21 08:02:20-0800 [Broker,client] trying
process.signalProcess('KILL')
2009-12-21 08:02:20-0800 [Broker,client] signalProcess/os.kill failed both
times
2009-12-21 08:02:20-0800 [Broker,client] <twisted.internet.tcp.Connector
instance at 0x10069dcf8> will retry in 2 seconds
2009-12-21 08:02:20-0800 [Broker,client] Stopping factory
<buildbot.slave.bot.BotFactory instance at 0x1014fb3b0>
2009-12-21 08:02:20-0800 [-] command finished with signal 1, exit code
None, elapsedTime: 0.014126
2009-12-21 08:02:20-0800 [-] SlaveBuilder.commandComplete None
2009-12-21 08:02:23-0800 [-] Starting factory
<buildbot.slave.bot.BotFactory instance at 0x1014fb3b0>
2009-12-21 08:02:31-0800 [Broker,client] message from master: attached
2009-12-21 08:02:31-0800 [Broker,client] Peer will receive following PB
traceback:
2009-12-21 08:02:31-0800 [Broker,client] Unhandled Error
Traceback (most recent call last):
File
"/System/Library/Frameworks/Python.framework/Versions/2.6/Extras/lib/python/twisted/spread/banana.py",
line 146, in gotItem
self.callExpressionReceived(item)
File
"/System/Library/Frameworks/Python.framework/Versions/2.6/Extras/lib/python/twisted/spread/banana.py",
line 111, in callExpressionReceived
self.expressionReceived(obj)
File
"/System/Library/Frameworks/Python.framework/Versions/2.6/Extras/lib/python/twisted/spread/pb.py",
line 514, in expressionReceived
method(*sexp[1:])
File
"/System/Library/Frameworks/Python.framework/Versions/2.6/Extras/lib/python/twisted/spread/pb.py",
line 826, in proto_message
self._recvMessage(self.localObjectForID, requestID, objectID,
message, answerRequired, netArgs, netKw)
--- <exception caught here> ---
File
"/System/Library/Frameworks/Python.framework/Versions/2.6/Extras/lib/python/twisted/spread/pb.py",
line 840, in _recvMessage
netResult = object.remoteMessageReceived(self, message,
netArgs, netKw)
File
"/System/Library/Frameworks/Python.framework/Versions/2.6/Extras/lib/python/twisted/spread/flavors.py",
line 112, in remoteMessageReceived
raise NoSuchMethod("No such method: remote_%s" % (message,))
twisted.spread.flavors.NoSuchMethod: No such method:
remote_getVersion
--
For some reason, I only see this problem if I have multiple builders
running on the slave. If so, then it occurs almost every build (the build
takes ~10h). If not, I have never noticed it happening.
I am trying to debug this further, but filed this to organize information
& track progress.
--
Ticket URL: <http://buildbot.net/trac/ticket/664>
Buildbot <http://buildbot.net/>
Buildbot: build/test automation
More information about the Commits
mailing list