[Buildbot-devel] Failed shell command and failed to kill the process

Li Chao lchao at idengines.com
Fri Apr 21 03:44:28 UTC 2006


Thank you for the reply. 

 

I am using netBSD. The shell command does not failed  every time. It
failed randomly. Once it failed, then it will continue to fail until I
stop the buildbot slave and restart it. 

 

I run the test and there were some test that is failed on the unix. You
can see the attached log file. Is there any configuration error for my
unix? 

 

I turn on set 'debug=True' in buildbot.slave.commands.ShellCommandPP. I
get the following information



2006/04/20 17:51 PDT [Broker,client]  startCommand:shell [id 1]

2006/04/20 17:51 PDT [Broker,client] ShellCommand._startCommand

2006/04/20 17:51 PDT [Broker,client]  ls -al

2006/04/20 17:51 PDT [Broker,client]   in dir
/amd/xserve-nfs/Volumes/RAID0/Home/releng/buildbot/slave1_ebbtide_ebbtid
e/test/build (timeout 30 secs)

2006/04/20 17:51 PDT [Broker,client]   argv: ['ls', '-al']

2006/04/20 17:51 PDT [Broker,client]   environment: {....

......

.....}

2006/04/20 17:51 PDT [Broker,client] ShellCommandPP.connectionMade

2006/04/20 17:51 PDT [Broker,client]  assigning self.command.process:
<PTYProcess pid=16947 status=-1>

2006/04/20 17:51 PDT [Broker,client]  closing stdin

2006/04/20 17:51 PDT [-] ShellCommandPP.outReceived

2006/04/20 17:52 PDT [-] command timed out: 30 seconds without output,
killing pid 16947

2006/04/20 17:52 PDT [-] trying os.kill(-pid, 9)

2006/04/20 17:52 PDT [-]  signal 9 sent successfully

2006/04/20 17:52 PDT [-] we tried to kill the process, and it wouldn't
die.. finish anyway

2006/04/20 17:52 PDT [-] ShellCommand.failed: command failed: SIGKILL
failed to kill process

2006/04/20 17:52 PDT [-] SlaveBuilder.commandFailed
<buildbot.slave.commands.SlaveShellCommand instance at 0xd0ef80>

2006/04/20 17:52 PDT [-] Traceback (most recent call last):

      Failure: buildbot.slave.commands.TimeoutError: SIGKILL failed to
kill process

 

I can not really figure out what happened here. 

 

Thank you very much for your help.

Li Chao 

 

 

 

-----Original Message-----
From: Brian Warner [mailto:warner-buildbot at lothar.com] 
Sent: Thursday, April 20, 2006 10:23 AM
To: Li Chao
Cc: buildbot-devel at lists.sourceforge.net
Subject: Re: [Buildbot-devel] Failed shell command and failed to kill
the process

 

> Anyone has any idea what happened here? 

 

That's really really weird.

 

> I found that the command "ls" looks like " 6211 ttyr5-  ZW

> 0:00.00 (ls)". There are bracket surround "ls" , which means the "ls"
is

> already finished.   

 

The "Z" means that it has finished and is waiting for the parent process
(in

this case the buildslave) to reap it. Processes in this state are called

"zombies", because they're dead, but just won't go away until the parent

reads their exit status.

 

> command timed out: 30 seconds without output, killing pid 6116

 

This suggests that the buildslave process never saw the SIGCHLD which

indicated that a child process has terminated and needs to be reaped.

 

> SIGKILL failed to kill process

 

This might be the normal thing that happens when you try to kill a
zombie..

not sure.

 

 

Huh, I'm not sure where to go. It sounds like either SIGCHLD is not
being

delivered properly, or it is getting ignored or eaten somewhere.

 

I'd want to add some instrumentation to

twisted.internet.process.reapAllProcesses() and Process.reapProcess()
and

PTYProcess.reapProcess() to do a log.msg() each time they're called, so
we'd

know when/if SIGCHLD was being delivered. I'd also set 'debug=True' in

buildbot.slave.commands.ShellCommandPP, which will turn on additional
logging

when twisted thinks the process has ended. (If the process is still a
zombie,

then ShellCommandPP.processEnded will probably not have been called, but
it's

still worth turning on the debug messages). I'd also try running the

buildslave under strace to see if the signal is being delivered or not.

 

What version of Twisted are you using? Which unix are you running on? Do
the

buildbot unit tests pass on your box? (from the top of the buildbot
source

tree, do 'PYTHONPATH=. trial buildbot.test' to run them).

 

 

puzzled,

 -Brian

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://buildbot.net/pipermail/devel/attachments/20060420/6a486898/attachment.html>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: testresult.txt
URL: <http://buildbot.net/pipermail/devel/attachments/20060420/6a486898/attachment.txt>


More information about the devel mailing list