[Buildbot-commits] [SPAM] [Buildbot] #664: sporadic connection lost problems

Buildbot buildbot-devel at lists.sourceforge.net
Mon Dec 21 17:27:43 UTC 2009


#664: sporadic connection lost problems
--------------------+-------------------------------------------------------
Reporter:  ddunbar  |       Owner:           
    Type:  defect   |      Status:  new      
Priority:  major    |   Milestone:  undecided
 Version:           |    Keywords:           
--------------------+-------------------------------------------------------
 I have problems with buildbot intermittently dropping slave connections
 during a very long running build. I have seen the same problem on slaves
 connected over the Internet (particularly Windows machines), but I am now
 seeing this in a very closed environment.

 The exception I see is this:
 --
 remoteFailed: [Failure instance: Traceback (failure with no frames):
 <class 'twisted.spread.pb.PBConnectionLost'>: [Failure instance: Traceback
 (failure with no frames): <class 'twisted.internet.error.ConnectionDone'>:
 Connection was closed cleanly.
 ]
 ]
 --

 The slaves twistd.log contains this (mostly unrelated, I believe):
 --
 2009-12-21 08:02:20-0800 [Broker,client] SlaveBuilder._ackFailed:
 SlaveBuilder.sendUpdate
 2009-12-21 08:02:20-0800 [Broker,client] SlaveBuilder._ackFailed:
 SlaveBuilder.sendUpdate
 2009-12-21 08:02:20-0800 [Broker,client] SlaveBuilder._ackFailed:
 SlaveBuilder.sendUpdate
 2009-12-21 08:02:20-0800 [Broker,client] SlaveBuilder._ackFailed:
 SlaveBuilder.sendUpdate
 2009-12-21 08:02:20-0800 [Broker,client] SlaveBuilder._ackFailed:
 SlaveBuilder.sendUpdate
 2009-12-21 08:02:20-0800 [Broker,client] SlaveBuilder._ackFailed:
 SlaveBuilder.sendUpdate
 2009-12-21 08:02:20-0800 [Broker,client] SlaveBuilder._ackFailed:
 SlaveBuilder.sendUpdate
 2009-12-21 08:02:20-0800 [Broker,client] lost remote
 2009-12-21 08:02:20-0800 [Broker,client] lost remote
 2009-12-21 08:02:20-0800 [Broker,client] lost remote
 2009-12-21 08:02:20-0800 [Broker,client] lost remote step
 2009-12-21 08:02:20-0800 [Broker,client] stopCommand: halting current
 command <buildbot.slave.commands.SlaveShellCommand instance at
 0x10188c170>
 2009-12-21 08:02:20-0800 [Broker,client] command interrupted, killing pid
 78808
 2009-12-21 08:02:20-0800 [Broker,client] trying os.kill(-pid, 9)
 2009-12-21 08:02:20-0800 [Broker,client] trying
 process.signalProcess('KILL')
 2009-12-21 08:02:20-0800 [Broker,client] signalProcess/os.kill failed both
 times
 2009-12-21 08:02:20-0800 [Broker,client] <twisted.internet.tcp.Connector
 instance at 0x10069dcf8> will retry in 2 seconds
 2009-12-21 08:02:20-0800 [Broker,client] Stopping factory
 <buildbot.slave.bot.BotFactory instance at 0x1014fb3b0>
 2009-12-21 08:02:20-0800 [-] command finished with signal 1, exit code
 None, elapsedTime: 0.014126
 2009-12-21 08:02:20-0800 [-] SlaveBuilder.commandComplete None
 2009-12-21 08:02:23-0800 [-] Starting factory
 <buildbot.slave.bot.BotFactory instance at 0x1014fb3b0>
 2009-12-21 08:02:31-0800 [Broker,client] message from master: attached
 2009-12-21 08:02:31-0800 [Broker,client] Peer will receive following PB
 traceback:
 2009-12-21 08:02:31-0800 [Broker,client] Unhandled Error
         Traceback (most recent call last):
           File
 "/System/Library/Frameworks/Python.framework/Versions/2.6/Extras/lib/python/twisted/spread/banana.py",
 line 146, in gotItem
             self.callExpressionReceived(item)
           File
 "/System/Library/Frameworks/Python.framework/Versions/2.6/Extras/lib/python/twisted/spread/banana.py",
 line 111, in callExpressionReceived
             self.expressionReceived(obj)
           File
 "/System/Library/Frameworks/Python.framework/Versions/2.6/Extras/lib/python/twisted/spread/pb.py",
 line 514, in expressionReceived
             method(*sexp[1:])
           File
 "/System/Library/Frameworks/Python.framework/Versions/2.6/Extras/lib/python/twisted/spread/pb.py",
 line 826, in proto_message
             self._recvMessage(self.localObjectForID, requestID, objectID,
 message, answerRequired, netArgs, netKw)
         --- <exception caught here> ---
           File
 "/System/Library/Frameworks/Python.framework/Versions/2.6/Extras/lib/python/twisted/spread/pb.py",
 line 840, in _recvMessage
             netResult = object.remoteMessageReceived(self, message,
 netArgs, netKw)
           File
 "/System/Library/Frameworks/Python.framework/Versions/2.6/Extras/lib/python/twisted/spread/flavors.py",
 line 112, in remoteMessageReceived
             raise NoSuchMethod("No such method: remote_%s" % (message,))
         twisted.spread.flavors.NoSuchMethod: No such method:
 remote_getVersion

 --

 For some reason, I only see this problem if I have multiple builders
 running on the slave. If so, then it occurs almost every build (the build
 takes ~10h). If not, I have never noticed it happening.

 I am trying to debug this further, but filed this to organize information
 & track progress.

-- 
Ticket URL: <http://buildbot.net/trac/ticket/664>
Buildbot <http://buildbot.net/>
Buildbot: build/test automation


More information about the Commits mailing list