[Buildbot-devel] buildbot-0.6.2 released

Mon Dec 13 09:04:14 UTC 2004

I've just finished pushing 0.6.2 out the door. The release is signed, as
usual, by my GPG key 0x1514A7BD. Release notes (the NEWS file) are attached
to this message. The release is available from the SourceForge download page:

 http://sourceforge.net/project/showfiles.php?group_id=73177

MD5 checksum is as follows:

 37196d78e276b8f508c2aa4a584abee0  dist/buildbot-0.6.2.tar.gz

0.6.2 is a minor bugfix release, which improves the behavior of both the
buildslave and buildmaster when the connection between them is lost. The
slave will now terminate any running child processes, while the master will
mark the current BuildStep as having failed rather than let it hang forever.
In addition, it is now possible to interrupt a running build, through both a
web page button and a command to the IRC status bot. Note that you will need
to upgrade both master and all slaves to enable the "Stop Build" feature to
work properly.

I'm hopeful that this will address the most egregious symptoms of the
dynamic-IP problem.. if it still misbehaves, please let me know. There are
more fixes to come in this area (pinging the builder before starting the
build, basically).

Projects currently in the pipeline:

 exarkun is rewriting the HTML layout using Nevow. We do not intend to add
 any new dependencies to BuildBot yet: any required Nevow code will be added
 to the distribution and installed along with the rest of buildbot. This
 should make it easier to build web page interfaces, including the simplified
 "one page" layout which forsakes the historical information to just give a
 summary of the current build status.

 somebody (I forgot who, sorry) is working on moving the docs to texinfo
 format, from which we will be able to generate sensible man pages and such.

 The 'buildbot' command (the one that gets installed in /usr/bin/) will be
 enhanced to function as a simple text-mode status client. The idea would be
 to put a .buildbot file in the top of your source tree which contains the
 location of the buildmaster (hostname and port number). With that, running
 'buildbot watch' or something would contact the master and let you know what
 builds are currently running, and emit the results when they complete.

 I'm still thinking about how to make 'try' work. Most of the pieces are now
 in place, however the sticking point is how to securely get a patch up to
 the buildmaster. I suspect it will involve ssh, and configuring the
 buildmaster to accept patches in a specific master-local directory (so you
 can use ssh authorized_keys and unix filesystem permissions to control who
 gets to start such a build). The 'buildbot' command will be the entry point,
 so that all you have to do is run 'buildbot try' from within your working
 directory and then wait for the results.

 Locks and Dependencies need to be finally implemented.

 Buildslave classes, to let builds be distributed across multiple
 (identically configured) buildslaves.

And of course the usual collection of bugfixes, etc.

Please report any problems, questions, etc, to this buildbot-devel list as
usual. Thanks!

Have an it's a wonderful life day,
 -Brian

* Release 0.6.2 (13 Dec 2004)

** new features

It is now possible to interrupt a running build. Both the web page and the
IRC bot feature 'stop build' commands, which can be used to interrupt the
current BuildStep and accelerate the termination of the overall Build. The
status reporting for these still leaves something to be desired (an
'interrupt' event is pushed into the column, and the reason for the interrupt
is added to a pseudo-logfile for the step that was stopped, but if you only
look at the top-level status it appears that the build failed on its own).

Builds are also halted if the connection to the buildslave is lost. On the
slave side, any active commands are halted if the connection to the
buildmaster is lost.

** minor new features

The IRC log bot now reports ETA times in a MMSS format like "2m45s" instead
of the clunky "165 seconds".

** bug fixes

*** Slave Disconnect

Slave disconnects should be handled better now: the current build should be
abandoned properly. Earlier versions could get into weird states where the
build failed to finish, clogging the builder forever (or at least until the
buildmaster was restarted).

In addition, there are weird network conditions which could cause a
buildslave to attempt to connect twice to the same buildmaster. This can
happen when the slave is sending large logfiles over a slow link, while using
short keepalive timeouts. The buildmaster has been fixed to allow the second
connection attempt to take precedence over the first, so that the older
connection is jettisoned to make way for the newer one.

In addition, the buildslave has been fixed to be less twitchy about timeouts.
There are now two parameters: keepaliveInterval (which is controlled by the
mktap 'keepalive' argument), and keepaliveTimeout (which requires editing the
.py source to change from the default of 30 seconds). The slave expects to
see *something* from the master at least once every keepaliveInterval
seconds, and will try to provoke a response (by sending a keepalive request)
'keepaliveTimeout' seconds before the end of this interval just in case there
was no regular traffic. Any kind of traffic will qualify, including
acknowledgements of normal build-status updates.

The net result is that, as long as any given PB message can be sent over the
wire in less than 'keepaliveTimeout' seconds, the slave should not mistakenly
disconnect because of a timeout. There will be traffic on the wire at least
every 'keepaliveInterval' seconds, which is what you want to pay attention to
if you're trying to keep an intervening NAT box from dropping what it thinks
is an abandoned connection. A quiet loss of connection will be detected
within 'keepaliveInterval' seconds.

*** Large Logfiles

The web page rendering code has been fixed to deliver large logfiles in
pieces, using a producer/consumer apparatus. This avoids the large spike in
memory consumption when the log file body was linearized into a single string
and then buffered in the socket's application-side transmit buffer. This
should also avoid the 640k single-string limit for web.distrib servers that
could be hit by large (>640k) logfiles.