[Buildbot-devel] Failed shell command and failed to kill the process
Li Chao
lchao at idengines.com
Fri Apr 21 03:44:28 UTC 2006
Thank you for the reply.
I am using netBSD. The shell command does not failed every time. It
failed randomly. Once it failed, then it will continue to fail until I
stop the buildbot slave and restart it.
I run the test and there were some test that is failed on the unix. You
can see the attached log file. Is there any configuration error for my
unix?
I turn on set 'debug=True' in buildbot.slave.commands.ShellCommandPP. I
get the following information
2006/04/20 17:51 PDT [Broker,client] startCommand:shell [id 1]
2006/04/20 17:51 PDT [Broker,client] ShellCommand._startCommand
2006/04/20 17:51 PDT [Broker,client] ls -al
2006/04/20 17:51 PDT [Broker,client] in dir
/amd/xserve-nfs/Volumes/RAID0/Home/releng/buildbot/slave1_ebbtide_ebbtid
e/test/build (timeout 30 secs)
2006/04/20 17:51 PDT [Broker,client] argv: ['ls', '-al']
2006/04/20 17:51 PDT [Broker,client] environment: {....
......
.....}
2006/04/20 17:51 PDT [Broker,client] ShellCommandPP.connectionMade
2006/04/20 17:51 PDT [Broker,client] assigning self.command.process:
<PTYProcess pid=16947 status=-1>
2006/04/20 17:51 PDT [Broker,client] closing stdin
2006/04/20 17:51 PDT [-] ShellCommandPP.outReceived
2006/04/20 17:52 PDT [-] command timed out: 30 seconds without output,
killing pid 16947
2006/04/20 17:52 PDT [-] trying os.kill(-pid, 9)
2006/04/20 17:52 PDT [-] signal 9 sent successfully
2006/04/20 17:52 PDT [-] we tried to kill the process, and it wouldn't
die.. finish anyway
2006/04/20 17:52 PDT [-] ShellCommand.failed: command failed: SIGKILL
failed to kill process
2006/04/20 17:52 PDT [-] SlaveBuilder.commandFailed
<buildbot.slave.commands.SlaveShellCommand instance at 0xd0ef80>
2006/04/20 17:52 PDT [-] Traceback (most recent call last):
Failure: buildbot.slave.commands.TimeoutError: SIGKILL failed to
kill process
I can not really figure out what happened here.
Thank you very much for your help.
Li Chao
-----Original Message-----
From: Brian Warner [mailto:warner-buildbot at lothar.com]
Sent: Thursday, April 20, 2006 10:23 AM
To: Li Chao
Cc: buildbot-devel at lists.sourceforge.net
Subject: Re: [Buildbot-devel] Failed shell command and failed to kill
the process
> Anyone has any idea what happened here?
That's really really weird.
> I found that the command "ls" looks like " 6211 ttyr5- ZW
> 0:00.00 (ls)". There are bracket surround "ls" , which means the "ls"
is
> already finished.
The "Z" means that it has finished and is waiting for the parent process
(in
this case the buildslave) to reap it. Processes in this state are called
"zombies", because they're dead, but just won't go away until the parent
reads their exit status.
> command timed out: 30 seconds without output, killing pid 6116
This suggests that the buildslave process never saw the SIGCHLD which
indicated that a child process has terminated and needs to be reaped.
> SIGKILL failed to kill process
This might be the normal thing that happens when you try to kill a
zombie..
not sure.
Huh, I'm not sure where to go. It sounds like either SIGCHLD is not
being
delivered properly, or it is getting ignored or eaten somewhere.
I'd want to add some instrumentation to
twisted.internet.process.reapAllProcesses() and Process.reapProcess()
and
PTYProcess.reapProcess() to do a log.msg() each time they're called, so
we'd
know when/if SIGCHLD was being delivered. I'd also set 'debug=True' in
buildbot.slave.commands.ShellCommandPP, which will turn on additional
logging
when twisted thinks the process has ended. (If the process is still a
zombie,
then ShellCommandPP.processEnded will probably not have been called, but
it's
still worth turning on the debug messages). I'd also try running the
buildslave under strace to see if the signal is being delivered or not.
What version of Twisted are you using? Which unix are you running on? Do
the
buildbot unit tests pass on your box? (from the top of the buildbot
source
tree, do 'PYTHONPATH=. trial buildbot.test' to run them).
puzzled,
-Brian
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://buildbot.net/pipermail/devel/attachments/20060420/6a486898/attachment.html>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: testresult.txt
URL: <http://buildbot.net/pipermail/devel/attachments/20060420/6a486898/attachment.txt>
More information about the devel
mailing list