[Buildbot-devel] Log polling errors

Jean-Paul Calderone exarkun at divmod.com
Sat Jul 29 19:20:27 UTC 2006


I recently started seeing this one a build slave:

2006/07/28 15:15 CDT [-] Unhandled error in Deferred:
2006/07/28 15:15 CDT [-] Traceback (most recent call last):
          File "/usr/lib/python2.2/site-packages/twisted/application/app.py", line 128, in runReactorWithLogging
            reactor.run()
          File "/usr/lib/python2.2/site-packages/twisted/internet/posixbase.py", line 200, in run
            self.mainLoop()
          File "/usr/lib/python2.2/site-packages/twisted/internet/posixbase.py", line 208, in mainLoop
            self.runUntilCurrent()
          File "/usr/lib/python2.2/site-packages/twisted/internet/base.py", line 533, in runUntilCurrent
            call.func(*call.args, **call.kw)
        --- <exception caught here> ---
          File "/usr/lib/python2.2/site-packages/twisted/internet/task.py", line 84, in __call__
            self.f(*self.a, **self.kw)
          File "buildbot/slave/commands.py", line 171, in poll
            self.f = open(self.logfile, "rb")
        exceptions.IOError: [Errno 2] No such file or directory: 'path/to/log/file'

2006/07/28 15:47 CDT [-] command finished with signal None, exit code 1
2006/07/28 15:47 CDT [-] Traceback (most recent call last):
          File "/usr/lib/python2.2/site-packages/twisted/internet/base.py", line 510, in runUntilCurrent
            f(*a, **kw)
          File "/usr/lib/python2.2/site-packages/twisted/internet/process.py", line 44, in reapAllProcesses
            process.reapProcess()
          File "/usr/lib/python2.2/site-packages/twisted/internet/process.py", line 752, in reapProcess
            self.processEnded(status)
          File "/usr/lib/python2.2/site-packages/twisted/internet/process.py", line 774, in processEnded
            self.maybeCallProcessEnded()
        --- <exception caught here> ---
          File "/usr/lib/python2.2/site-packages/twisted/internet/process.py", line 810, in maybeCallProcessEnded
            self.proto.processEnded(failure.Failure(e))
          File "buildbot/slave/commands.py", line 134, in processEnded
            self.command.finished(sig, rc)
          File "buildbot/slave/commands.py", line 381, in finished
            w.stop()
          File "buildbot/slave/commands.py", line 158, in stop
            self.poller.stop()
          File "/usr/lib/python2.2/site-packages/twisted/internet/task.py", line 68, in stop
            assert self.running
        exceptions.AssertionError:

2006/07/28 16:07 CDT [-] command timed out: 1200 seconds without output, killing pid 16724
2006/07/28 16:07 CDT [-] trying os.kill(-pid, 9)
2006/07/28 16:07 CDT [-] trying process.signalProcess('KILL')
2006/07/28 16:07 CDT [-] signalProcess/os.kill failed both times
2006/07/28 16:07 CDT [-] we tried to kill the process, and it wouldn't die.. finish anyway
2006/07/28 16:07 CDT [-] ShellCommand.failed: command failed: SIGKILL failed to kill process
2006/07/28 16:07 CDT [-] SlaveBuilder.commandFailed <buildbot.slave.commands.SlaveShellCommand instance at 0x84a012c>
2006/07/28 16:07 CDT [-] Traceback (most recent call last):
        Failure: buildbot.slave.commands.TimeoutError: SIGKILL failed to kill process


I don't know why the file being polled does not exist on this one slave; I suppose it is a race of some sort.  However, since the exception prevents the LoopingCall used by the poller from being started, when the loop is eventually stopped, another exception occurs.  This seems to confuse the buildmaster and the slave becomes indefinitely wedged.

This change seems to help things move along:

--- commands.orig       2006-07-29 14:19:50.000000000 -0500
+++ commands.py 2006-07-29 14:19:26.000000000 -0500
@@ -151,11 +151,16 @@
         self.poller = task.LoopingCall(self.poll)
 
     def start(self):
-        self.poller.start(self.POLL_INTERVAL)
+        self.poller.start(self.POLL_INTERVAL).addErrback(self._cleanupPoll)
+
+    def _cleanupPoll(self, err):
+        log.err(err, msg="Polling error")
+        self.poller = None
 
     def stop(self):
         self.poll()
-        self.poller.stop()
+        if self.poller is not None:
+            self.poller.stop()
 
     def statFile(self):
         if os.path.exists(self.logfile):

Jean-Paul





More information about the devel mailing list