[Buildbot-devel] Severe issues with buildbot 0.7.4

Kenneth Lareau Ken.Lareau at nominum.com
Thu Apr 19 19:30:34 UTC 2007


This just hit critical mass, so hopefully someone can respond to
this quickly; we are unable to run our current builds without a
fix to this issue.

Software:
    buildbot 0.7.4
    Twisted 2.5.0
    zope.interface 3.3.0

We have a system set up for builds that seems to be causing the
build master process to stop logging all information.  The problem
is happening when a step is run that's supposed to copy a file to
a directory.  The step itself is:

    %(workdir)s/tare/head/nightly/copy_package %(workdir)s 
dhcptest-package %(platform)s

When this command attempts to run, the following traceback is seen
on the client:


2007/04/19 12:08 -0700 [Broker,client] error in ShellCommand._startCommand
2007/04/19 12:08 -0700 [Broker,client] Unhandled Error
         Traceback (most recent call last):
           File 
"/usr/local/lib/python2.4/site-packages/buildbot/slave/bot.py", l
ine 169, in remote_startCommand
             d = self.command.doStart()
           File 
"/usr/local/lib/python2.4/site-packages/buildbot/slave/commands.p
y", line 614, in doStart
             d = defer.maybeDeferred(self.start)
           File 
"/usr/local/lib/python2.4/site-packages/twisted/internet/defer.py
", line 107, in maybeDeferred
             result = f(*args, **kw)
           File 
"/usr/local/lib/python2.4/site-packages/buildbot/slave/commands.p
y", line 725, in start
             d = self.command.start()
         --- <exception caught here> ---
           File 
"/usr/local/lib/python2.4/site-packages/buildbot/slave/commands.p
y", line 292, in start
             self._startCommand()
           File 
"/usr/local/lib/python2.4/site-packages/buildbot/slave/commands.p
y", line 322, in _startCommand
             argv = (" ".join(self.command) % os.environ).split()
           File "/usr/local/lib/python2.4/UserDict.py", line 17, in 
__getitem__
             def __getitem__(self, key): return self.data[key]
         exceptions.KeyError: 'workdir'

2007/04/19 12:08 -0700 [Broker,client] SlaveBuilder.commandFailed 
<buildbot.slav
e.commands.SlaveShellCommand instance at 0x855b52c>
2007/04/19 12:08 -0700 [Broker,client] Unhandled Error
         Traceback (most recent call last):
         Failure: buildbot.slave.commands.AbandonChain: -1


At the same time, the following is seen on the master buildbot
process:


2007/04/19 12:08 -0700 [-] ShellCommand.start using log 
<buildbot.status.builder.LogFile instance at 0xb3e7658c>
2007/04/19 12:08 -0700 [-]  for cmd <RemoteShellCommand 
'['%(workdir)s/tare/head/nightly/copy_package', '%(workdir)s', 
'dhcptest-package', '%(platform)s']'>
2007/04/19 12:08 -0700 [-] <RemoteShellCommand 
'['%(workdir)s/tare/head/nightly/copy_package', '%(workdir)s', 
'dhcptest-package', '%(platform)s']'>: RemoteCommand.run [6]
2007/04/19 12:08 -0700 [-] command 
'['%(workdir)s/tare/head/nightly/copy_package', '%(workdir)s', 
'dhcptest-package', '%(platform)s']' in dir 'build'
2007/04/19 12:08 -0700 [-] LoggedRemoteCommand.start
2007/04/19 12:08 -0700 [-] BuildStep.failed, traceback follows


At this point buildbot stops logging completely on the master,
though it still seems to be responsive to other builders being
run.  This is highly undesirable, and what makes this even more
frustrating is that this only began to fail last night, and the
only change in that timeframe was a minor addition of several
new builders across several of the platforms (these are auto-
generated, so there's no "typo"s; in this case, I added a single
(valid) branchname and these new builders suceeded just fine
when I tested them yesterday).

Can anyone help at this point?  I will need to take the failing
system offline until I find a solution, and hope that other
systems won't cause the same issue or all of our builds will be
put on hold, which for us is a Very Bad Thing (TM).

Thanks for any assistance you can give.


Ken Lareau




More information about the devel mailing list