[Buildbot-devel] Dependent schedulers not being triggered
Kenneth Lareau
Ken.Lareau at nominum.com
Fri Feb 16 01:06:21 UTC 2007
This is a follow-up to the "Unable to use 'manhole' module" thread;
as I mentioned in my last message in that thread, the whole reason
for me wanting manhole to work was to try and debug an issue where
we have a set of dependent schedulers that apparently aren't being
triggered, which causes a set of builders to not be run during our
nightly builds.
More specifically, we have several builders which are only to be
run if certain other builders succeed. Until around three weeks
ago, this worked just fine. Somewhere around January 16th, they
stopped working, but there has been no changes to the configuration
and the builders that are depended upon have been successful; hence
my attempt to use the 'manhole' module for debugging. However,
after spending nearly a day trying to work my way through the
buildbot code, I have been unsuccessful with the aforementioned
module in assisting me with this issue, so I turn to the mailing
list for help.
What little I do have currently is in the form of some log compar-
isons that a coworker did while trying to track down the issue.
Previous to the breakage, the following was seen in the logs when
a builder run from a dependent scheduler was triggered:
2007/01/02 05:20 PST [-] releaseLocks(<step.NOMStep instance at
0xb19bfbac>): []
2007/01/02 05:20 PST [-] step 'build' complete: success
2007/01/02 05:20 PST [-] <Build all head rhel-4-x86-64-0>: build finished
2007/01/02 05:20 PST [-] setting expectations for next time
2007/01/02 05:20 PST [-] new expectations: 609.034803145 seconds
2007/01/02 05:20 PST [-]
maybeStartBuild:[<buildbot.process.base.BuildRequest instance at
0xb1ce1b8c>] [<buildbot.process.builder.SlaveBuilder instance at
0xb32e9f8c>]
2007/01/02 05:20 PST [-] starting build <Build benchmark ans-dist
benchmarking-0>.. pinging the slave
Here's the equivalent log information from after the breakage:
2007/02/08 04:18 PST [-] <Build all head rhel-4-x86-64-0>: build finished
2007/02/08 04:18 PST [-] setting expectations for next time
2007/02/08 04:18 PST [-] new expectations: 489.935475789 seconds
2007/02/08 04:18 PST [-] maybeStartBuild: []
[<buildbot.process.builder.SlaveBuilder instance at 0xb4c39ac>]
2007/02/08 04:18 PST [-] releaseLocks(<Build all head rhel-4-x86-64-0>):
[<SlaveLock(rhel-4-x86-64-0 kelpie lock)[rhel-4-x86-64-0] -1302008020>]
2007/02/08 04:18 PST [-] <SlaveLock(rhel-4-x86-64-0 kelpie
lock)[rhel-4-x86-64-0] -1302008020> release(<Build all head
rhel-4-x86-64-0>)
2007/02/08 04:18 PST [-] <SlaveLock(rhel-4-x86-64-0 kelpie
lock)[rhel-4-x86-64-0] -1302008020> nowAvailable
The one thing that was noticed was the change between the
maybeStartBuild lines:
2007/01/02 05:20 PST [-]
maybeStartBuild:[<buildbot.process.base.BuildRequest instance at
0xb1ce1b8c>] [<buildbot.process.builder.SlaveBuilder instance at
0xb32e9f8c>]
to
2007/02/08 04:18 PST [-] maybeStartBuild: []
[<buildbot.process.builder.SlaveBuilder instance at 0xb4c39ac>]
Note that the first list is notably empty; this is the part where
I'm uncertain what is breaking. Regrettably, full logs from before
the breakage have been deleted so it will be difficult to get more
information from before the failures began to occur.
As for our configuration; yes, we have 186 builders (with more to
be added soon) and our master.cfg is currently 600 lines long (and
will get longer once I clean it up and comment it more). If being
able to see the configuration will help, I will check to see what
from it I can publicly post, as there's some potentially company
sensitive information in it that might need to be excised/replaced,
but I'm sure that can be resolved.
If any further information is needed, please don't hesistate to
ask; oh, a quick rundown once more of the OS/software involved
with the master buildbot server:
RHEL 3 (i386)
buildbot 0.7.4
Twisted 2.5.0
Python 2.4.3
Thanks for any help that can be given!
Ken Lareau
Nominum, Inc.
More information about the devel
mailing list