[Buildbot-devel] zombie processes on Solaris

Dustin J. Mitchell dustin at zmanda.com
Wed Jun 13 19:17:33 UTC 2007


I'm having some trouble with builds hanging on a new Solaris 10 buildslave I
just put together.  The step in question is:

f.addStep(my.steps.TarballSourceNFS)
# where
class TarballSourceNFS(Source):
    """
    Get a tarball from $HOME, as left there by KeepTarballNFS.
    """
    def __init__(self, workdir, **kwargs):
        self.workdir = workdir
        Source.__init__(self, workdir, **kwargs)

    def startVC(self, branch, revision, patch):
        branch = zmanda.base.branch_to_str(self.getProperty("branch"))
        revision = self.getProperty("revision")
        cmd = """
        gunzip -c $HOME/dist/tarballs/amanda-%(branch)s-%(revision)s.tar.gz |
            tar -xf - &&
        rm -rf %(workdir)s &&
        mv amanda* %(workdir)s""" % {
            'branch' : branch,
            'revision' : revision,
            'workdir' : os.path.basename(self.workdir),
        }

        # run the command in the *parent* of the workdir
        rsc = RemoteShellCommand(os.path.dirname(self.workdir), cmd)
        self.startCommand(rsc)

The class basically just writes a shell command and hands it off to the parent
class.  I realize it's a hack, but I think that's immaterial to this question.
When it runs, it succeeds, as evidenced by looking at workdir on the slave.  On
the slave, I also see (ps -ef):
     UID   PID  PPID   C    STIME TTY         TIME CMD
  buildsla 23580 17211   0 09:17:02 ?           0:00 python /home/buildslave/bin/twistd --no_save -y buildbot.tac
  buildsla 26817 23580   0        - ?           0:01 <defunct>
so it looks like the spawnProcess isn't catching its child's exit?

I have also seen this step actually succeed, but the subsequent step
(configure) failed, and hung the same way (including the defunct process).

I see in the slave's logs:
2007/06/13 11:52 -0700 [Broker,client] <SlaveBuilder 'archtest-sparc-solaris-10' at 5573728>.startBuild
2007/06/13 11:52 -0700 [Broker,client]  startCommand:shell [id 2]
2007/06/13 11:52 -0700 [Broker,client] ShellCommand._startCommand
2007/06/13 11:52 -0700 [Broker,client]  /bin/sh -c 
                        gunzip -c $HOME/dist/tarballs/amanda-trunk-6674.tar.gz |
                                tar -xf - &&
                        rm -rf build &&
                        mv amanda* build
2007/06/13 11:52 -0700 [Broker,client]   in dir /tmp/buildslave-9989/archtest-sparc-solaris-10/ (timeout 1200 secs)
2007/06/13 11:52 -0700 [Broker,client]   watching logfiles {}
2007/06/13 11:52 -0700 [Broker,client]   argv: ['/bin/sh', '-c', '\n\t\tgunzip -c $HOME/dist/tarballs/amanda-trunk-6674.tar.gz |\n\t\t\ttar -xf - &&\n\t\trm -rf build &&\n\t\tmv amanda* build']
2007/06/13 11:52 -0700 [Broker,client]   environment: { ... }
2007/06/13 11:55 -0700 [-] sending app-level keepalive
2007/06/13 12:05 -0700 [-] sending app-level keepalive
2007/06/13 12:12 -0700 [-] command timed out: 1200 seconds without output, killing pid 26817
2007/06/13 12:12 -0700 [-] trying os.kill(-pid, 9)
2007/06/13 12:12 -0700 [-]  signal 9 sent successfully
2007/06/13 12:12 -0700 [-] we tried to kill the process, and it wouldn't die.. finish anyway
2007/06/13 12:12 -0700 [-] ShellCommand.failed: command failed: SIGKILL failed to kill process
2007/06/13 12:12 -0700 [-] SlaveBuilder.commandFailed <buildbot.slave.commands.SlaveShellCommand instance at 0x553440>
2007/06/13 12:12 -0700 [-] Unhandled Error
        Traceback (most recent call last):
        Failure: buildbot.slave.commands.TimeoutError: SIGKILL failed to kill process

Pinging the builder after the failure works just fine, so the Twisted event
loop seems fine.  I don't see the debugging message from 'processEnded'
anywhere in the logs.

I'm running:
  Python 2.3.5 (#1, Nov 30 2005, 10:43:26) [C] on sunos5
  Twisted-2.5.0
  zope.interface-3.3.0
  buildbot-0.7.5

Any suggestions?

Dustin

--
        Dustin J. Mitchell
        Storage Software Engineer, Zmanda, Inc.
        http://www.zmanda.com/




More information about the devel mailing list