[Buildbot-devel] "commandFailed" error on Windows build slave

Wed Jun 22 23:14:57 UTC 2005

From: Gerald Combs <gerald at ethereal.com>
Subject: Re: [Buildbot-devel] "commandFailed" error on Windows build slave
Date: Wed, 22 Jun 2005 14:44:56 -0500

> In trying to track down this error I noticed that the "SlaveBuilder" class
> in slave/bot.py doesn't define a "commandFailed" method. Should it?

Nope.. commandFailed belongs on the commands.Command subclass (including
ShellCommand). The log message in SlaveBuilder.commandComplete is misleading
and should be changed.

Now, the fact that commands.ShellCommand doesn't implement commandFailed is a
bug (and a fairly recent one, I think). That bug will only be triggered when
the shell command takes too long and is killed, or when the connection to the
master is lost and the command is abandoned. There are a couple of other bugs
surrounding this one.. basically the error-recovery code is broken and the
slave can get left in a state where it thinks a command is still running, so
when the next build starts, it wants to kill off the "leftover" command (but
it can't, because there isn't really one running). The exception that is
raised when the interupt fails seems to prevent the new build from starting
(which would correct the problem).

I think the missing method needs to do:

def commandFailed(self, why):
    self.interrupted = True
    self.deferred.errback(why)

but I don't have the time today to write a proper test case for it. (it
shouldn't be too hard to write, but it will depend upon os.kill and thus
might not work under windows).

If the problem is still present when I get back next week, let me know and
I'll investigate it properly then.

cheers,
 -Brian