[Buildbot-devel] Slave not recognizing process end

"Martin v. Löwis" martin at v.loewis.de
Sat Jan 2 18:51:02 UTC 2010


We have a slave where the test command is configured with
a timeout of 1800s. The command completed before
that, yet the slave still believed it had to kill it.

So we see the following pieces in the log:

2010-01-01 15:31:43-0500 [-] command timed out: 1800 seconds without output
2010-01-01 15:31:43-0500 [-] self.process has no pid
2010-01-01 15:31:43-0500 [-] trying process.signalProcess('KILL')
2010-01-01 15:31:43-0500 [-] Unhandled Error
[...]
"/usr/lib/python2.6/site-packages/twisted/internet/process.py", line
312, in signalProcess
	    if os.WIFEXITED(status):
	twisted.internet.error.ProcessExitedAlready:

As a consequence, the slave now still believes that the process
is running, and any attempts to cancel the process from the master
all lead to the same exception (ProcessExitedAlready).

Restarting the slave would work around the problem, however, since this
happens often, I would like to fix it for good.

Where should I look?

Regards,
Martin




More information about the devel mailing list