[Buildbot-devel] Extending ShellCommand

Sat Jul 10 23:35:19 UTC 2004

> I just started playing with BuildBot, so want to apologize in advance 
> for potential FAQ.

Howdy! Great to have more users!

> I am trying to extend ShellCommand to run unittests and update status 
> nicely. Running unit tests works just fine, but updating status throws 
> an exception I can't decipher.

> class RunUnitTests(ShellCommand):

>     def finishStatus(self, result):
>         verb = "unittests"
>         if type(result) == types.TupleType:
>             result, text_ignored = result
>         if result == FAILURE:
>             self.updateCurrentActivity(color="red", text=[verb, "failed"])
>         else:
>             self.updateCurrentActivity(color="green", text=[verb])
>         self.finishStatusSummary()
>         self.finishCurrentActivity()
> 
> 
> What happens is startStatus() updates status to yellow 'testing' as 
> expected, log link works just fine, but when step is done, whole build 
> gets marked as successful, while unittests step still remains in yellow 
> 'testing' mode. Hitting on log link at this point genereates following 
> traceback:


Now that is interesting. I think it involves some mistakes I made w.r.t. the
python method-resolution-order in the unnecessarily complex internal
subclasses of buildbot.process.step . Here's what's supposed to happen:

 the slave sends a "the child process has finished" message
 self.remoteComplete() is invoked, i.e. InternalShellCommand.remoteComplete
  this closes the logfile and calls self.finished
 self.finished() is invoked, i.e. ShellCommand.finished
  this does some processing of the output, and calls self.stepComplete
 self.stepComplete() is invoked, i.e. InternalBuildStep.stepComplete, or
                                      BuildStep.stepComplete

  the base class (InternalBuildStep) is responsible for firing the callback
  that tells the overall Build that this step is complete, so that it can
  begin the next BuildStep

  the subclass (BuildStep) is additionally responsible for updating the
  status event (by calling .finishStatus), and telling the progress-tracker
  to wrap up (by recording the amount of time required by the now-complete
  step, so it can provide an ETA for the next build). It then upcalls so that
  the Build moves to the next step

 finishStatus() is a BuildStep-specific method (not present in
 InternalBuildStep) which manage status events (with methods like
 .updateCurrentActivity and .finishCurrentActivity). It calls
 .finishStatusSummary to add any "files" to the step's event (links to
 processed log files, like the 'problems', 'warnings', and 'log' files that
 the Twisted buildbot's RunUnitTest step produces). It finally calls
 .finishCurrentActivity to let the status-handling code know that this event
 is over and done with (so any external status clients are notified).


InternalShellCommand is intended for use as a sub-step by other BuildSteps
which manage their own status information. For example, a hypothetical
build-from-cvs autoconf step could use separate InternalShellCommands to run
the "automake", "autoconf", and "./configure" tools. This AutoconfStep might
combine the output from all three commands into a single event, with a single
logfile.

ShellCommand, in contrast, is for BuildSteps which are completely defined by
a single shell command, so all the status-management is done for you (except
that you have to subclass startStatus/finishStatus, just like you did, to
influence the text being displayed).

InternalBuildStep and BuildStep are distinct for the same reason.

The inheritance tree of your RunUnitTests results from the following set of
class declarations:

 class RunUnitTests(ShellCommand):
 class ShellCommand(InternalShellCommand, BuildStep):
 class InternalShellCommand(InternalBuildStep):
 class BuildStep(InternalBuildStep):


I think the problem is in ShellCommand (and subclasses), where .finished
performs this call:

        return self.stepComplete([rc, output])

The python MRO (depth-first) resolves .stepComplete into
InternalBuildStep.stepComplete, which doesn't do the status-updating. As a
result, .finishStatus is never called, and the event box get stuck in the
in-progress "testing" state.

I'm surprised I haven't seen this before, because it seems like a pretty
glaring bug. From a quick glance at the Twisted buildbot's logs, it seems
like .finishStatus is indeed being run, which either means there is a second
bug which exactly compensates for this one, or I've screwed up my analysis
somehow. It's possible that all of the twisted-specific BuildStep subclasses
override something or another, masking the bug from my usual test suite :).

I'll look further into this tonight or tomorrow. For now you could implement
your own .stepComplete method which explicitly calls BuildStep.stepComplete
instead of letting inheritance find it (and thus find the wrong one).

The medium-term fix is probably to add such a method to ShellCommand. The
long-term fix is to rewrite that whole mess, maybe with a mixin class, or
perhaps a single class-level flag that says "log for me" or not.


> "/usr/lib/python2.3/site-packages/buildbot/status/event.py", line 384, 
> in render
>             data += self.content(self.getEntries(), asText)
>           File 
> "/usr/lib/python2.3/site-packages/buildbot/status/event.py", line 356, 
> in content
>             for type, entry in entries:
>         exceptions.TypeError: iteration over non-sequence

This is a straightforward bug, patch included below and now checked into CVS.
(I'm using self.entries=None to mean "we've swapped out the data, go fetch it
from disk" and self.entries=[] to mean "the log is empty, nothing has
happened yet", and I was incorrectly using 'if self.entries:' instead of 'if
self.entries is not None:'). However, the fact that it is being triggered is
suspicious. Either you are looking at the event log during the tiny startup
window before anything has been added to the log, your run-tests step is not
producing any output, or there is another problem (probably the MRO again)
which is preventing command output from being sent to the log altogether.


> On a related note, buildbot did not like a couple of our unittests. The 
> whole unittesting process was dying saying that it caught signal 1. The 
> very same tests run just fin on the same machine, run by the same user 
> from the same folder. On closer examination it turned out that gnuplot 
> related tests were to blame. They did not do anything special, just 
> tested usage of Gnuplot.py, which basically spawns gnuplot process and 
> pipes some data into it. Any ideas why tests may run fine from shell, 
> but not from buildbot?

One possibility is running the tests under a PTY. Try flipping the "usePTY"
switch around line 174 of slavecommand.py and see if it has any effect.

The reason we run tests under a PTY whenever possible is because it creates a
"process group", which allows us to kill off all the descendants of the
testing process with a single kill() syscall (using a negative number as an
argument). This is very useful when you need to automatically kill a hanging
test, and the test script has spawned multiple grandchildren to execute the
test.

However, not all operating systems implement process groups in the same way
(solaris in particlar seems to behave differently). Some don't even have the
concept (usePTY is set to 0 on non-posix). And I've seen weird things happen
with programs which are vaguely interactive (which gnuplot sort of falls
into). I assume that they are varying their behavior depending upon whether
they are run under a TTY or not, but I've never really investigated it enough
to know for sure.

The usePTY switch really needs to be turned into a mktap-time configuration
parameter, rather than asking people to hack the source when they encounter a
problem.


thanks,
 -Brian


Index: buildbot/status/event.py
===================================================================
RCS file: /cvsroot/buildbot/buildbot/buildbot/status/event.py,v
retrieving revision 1.12
diff -u -r1.12 event.py

--- buildbot/status/event.py	1 May 2004 01:03:27 -0000	1.12
+++ buildbot/status/event.py	10 Jul 2004 23:01:20 -0000
@@ -311,7 +311,7 @@
                 self.swappable = None
             
     def getEntries(self):
-        if self.entries:
+        if self.entries is not None:
             return self.entries
         elif self.doSwap:
             try: