[Buildbot-devel] bug in 0.7.6 with 'buildbot reconfig'?
warner-buildbot at lothar.com
Wed Nov 21 10:02:39 UTC 2007
On Sat, 17 Nov 2007 16:29:51 +1100
John Pye <john.pye at anu.edu.au> wrote:
> Hi all
> Ben Hearsum wrote:
> > Greg Ward wrote:
> >> On Nov 14, 2007 8:47 AM, John Pye <john.pye at anu.edu.au> wrote:
> >>> I have what seems to be a bug with the 'buildbot reconfig'
> >>> command in Buildbot 0.7.6. When I do a 'buildbot restart .'
> >>> everything works fine. When I do a 'buildbot reconfig .' it also
> >>> works fine, but only if I do it once. If I make further config
> >>> changes then again attempt 'buildbot reconfig .' then I get a
> >>> page with debug output as follows.
> >> [...]
> >>> web.Server Traceback (most recent call last):
> >>> exceptions.AttributeError: 'NoneType' object has no attribute
> >>> 'getStatus'
> >> Yes, I have also seen this with 0.7.6. I have not been able to
> >> nail it down as precisely as you have. Hang on a second and I'll
> >> try ...
> >> ... nope. I just did "buildbot restart" and then "buildbot
> >> reconfig" four times in a row, and no error. Of course, I didn't
> >> make any config changes for those reconfig runs ...
> >> ... nope. I just did several "reconfig" runs with an actual
> >> configuration change between each one, and it's still fine.
> > Did you make any kind of changes to the config file when you did
> > this? I think it has something to do with the way the new Waterfall
> > consumes the soul of it's predecessor.
FWIW, only Builders consume the soul of their predecessor. The WebStatus and
Waterfall objects are stateless, so they have no soul to be consumed :).
It feels like you've got a TCP port providing access to a Waterfall object
which is no longer active (i.e. it's been de-parented, so that
self.parent.getStatus() throws an exception). I can't think of why this could
be happening, but to provide background for anyone else looking into this,
here's how config changes work:
1: the buildmaster has a Waterfall instance as one of its children. Let's
call this Instance A. This is active, with a self.parent, and a TCP
Listener that points to it.
2: 'buildbot reconfig' causes the config file to be executed, causing
everything inside it to be instantiated, creating a Waterfall instances
that we'll call Instance B. This Instance B is not active: it has no
self.parent, and is not yet running a TCP Listener
3: The buildmaster compares the existing Instance A with the proposed
Instance B for equality. Each config file object defines __cmp__ to
specify what it means to be "equal", usually by using ComparableMixin
and listing the attributes that should be compared. The general idea is
that everything you can set in the constructor arguments should be
compared for equality.
4: if A and B are equal, the buildmaster ignores B and keeps using A. B
gets forgotten and garbage collected.
5: if A and B are not equal, the buildmaster needs to replace the old one
with the new one. It does the following:
5a: detach the old Instance A by calling A.disownServiceParent(). The act of
deparenting it causes it to be shut down (i.e. A.stopService() is
called), causing the TCP Listener to be terminated. This takes a while
to run (i.e. it returns a Deferred).
5b: once A is shut down, B is attached with B.setServiceParent(buildmaster).
This causes B to be started up, and the first thing it does is to
set up a TCP Listener so the outside world can get to it.
6: all done
So it feels like the old Instance A is being deparented (disownServiceParent
has managed to erase self.parent) without being shut down (stopService didn't
manage to turn off the TCP Listener). Perhaps an exception occurred in
between these two steps which prevented stopService from finishing?
Are there any "Unhandled Error in Deferred" or other exceptions in your
twistd.log at the time you do the reconfig or shortly thereafter?
More information about the devel