[Buildbot-devel] Notes from Thursday's Meeting - 0.8.1 plans

Dustin J. Mitchell dustin at zmanda.com
Sun May 2 20:03:26 UTC 2010


We had a few technical difficulties. I learned one thing: use a laptop
which can connect to IRC (really SSH in this case): it turns out I was
unable to see half of the conversation!  John O'Duinn will be
uploading a video of the meeting, although apparently the video system
died a few times, so it may be partial.  Also, the overhead
microphones were not working, so it was difficult to hear a lot of the
people in the room.  So I will try to summarize here.  The notes I
displayed onscreen are here:

  http://buildbot.net/trac/wiki/Meeting29April2010

We began by talking about the major improvements to Buildbot in the
0.8.0 release, as a basis for where Buildbot stands and what projects
are in motion.  This provoked little discussion.

Then we moved into looking at proposed work for 0.8.1, looking both at
*whether* to do it and *how* to do it.

== Web UI as a first-class citizen ==

The web interface used to be a simple status display, a "peer" to
other status displays.  Over the years, it's become a significant
piece of Buildbot, and its official position should probably be
updated.  There are some proposed enhancements to the existing web
status, which saw no opposition.  We agreed that it should be possible
to add other, more sophisticated web frontends to Buildbot, but didn't
really discuss how to go about making that pluggable.

We talked quite a bit about the various web services interfaces that
Buildbot now sports: HTTP push, HTTP/HTML, XMLRPC, and JSON.  It was
proposed to add a REST API, and I suggested that at least one other
API should be removed.  So far, it looks like XMLRPC is on the
chopping block (this is an open mailing-list thread right now).  As
far as I know, nobody has stepped up to take on this task.  If someone
does, I would like to see all of the APIs sport the same set of
methods, parameters, and results, with a common implementation and
documentation.

== Remote Server Shutdown ==

This got added to the list during the meeting, with dicussion focusing
on adding an API method to shut down the master gracefully.  There is
already code to support this, although it is not merged.
Authentication is obviously important here!  Brian suggested that
authentication here should be identical to "who can access the
buildmaster from the shell", and thus SSH and a command-line interface
should be adequate.  Others pointed out that in a distributed
environment SSH can be difficult to script.

This brought up the question of control via web services APIs - that
is, doing more than just reading status.  The web UI already has a
number of control features built in, and their implementation is
pretty natural, so I don't have significant objections to adding these
features to the web services APIs - as long as they are properly
authenticated and authorized.

== Build status in DB ==

There was a lot of discussion here of writing to the status db via a
status listener, rather than writing to the DB directly from the
buildbot core.  This was seen as useful for those (hi, Mozilla)
creating a highly distributed buildmaster, because each buildmaster
could feed status to a single status-writer.  This would simplify the
core, since any status events would simply be calls to a notification
system, but it's not clear how the core would deal with events created
while the listener was not available.  Likewise, it would preclude the
core using any data in the history, e.g. building only builders whose
most recent build failed.

There was some brief discussion of support for other databases -
Murali suggested berkeley db, but there are no Python bindings beyond
the simple key/value interface.

Mozilla will be working on this project, so we'll see how it turns out
soon enough.

== Latent-Slave Support ==

Nobody objected to this idea.  Chris Atlee mentioned wanting the
ability to do generate an arbitrary number of slaves without naming
them all individually: currently every EC2 slave must have a distinct
AMI that knows, at a minimum, its unique slave name.

== Multi-project / Multi-repository support ==

We now have the capacity to build multiple projects in Buildbot, even
from different repositories.  There are lots of optimizations to make
to this support, mostly in the Source steps.  But we also need to
think about how to support interaction *between* projects - bundling
projects together, projects that are dependencies of other projects,
and so on.  There was general agreement that some sort of "aggregate"
SourceStamp is a good idea, the hard part being *building* the
SourceStamp in the scheduler and then *interpreting* it in the Source
steps.

VC support for submodules / externals came up as an alternative, but
these options are not useful in all cases.

Zmanda will be working on this.

== Build Coordination ==

Similar to the multi-project / multi-repository support, we need more
expressive tools to describe the relationship of builds to one
another.  We have Dependent and Triggerable schedulers, but these
offer limited flexibility.  There have been discussions of describing
builds as DAGs, or inventing a domain-specific language for the
purpose.

Murali brought up the idea of providing Buildbot configuration in some
structured format other than Python, giving the relationship between
builds that are required.  He also mentioned putting a web-based,
drag-and-drop GUI on top of this format.  I suggested that it's
already quite possible for users to generate Buildbot configuration
from structured data (by parsing that data in the master.cfg).  I also
said that a configuration GUI would be interesting, but will not be a
part of Buildbot, because it would of necessity reduce the flexibility
of the configuration.  There are already some configuration-generation
apps out there (notably Loki), and certainly within more narrow
environments it may even make business sense to provide such a
frontend.

== Windows Compatibility ==

Buildbot needs a Windows coordinator - someone who can judge the
sanity of a patch, test it out locally, and so on.  We also need some
Windows buildslaves.  I heard a volunteer for this who turns out not
to be the person I thought -- who was that?  Hopefully Mozilla will be
supplying a dozen or so buildslaves, so we'll be able to run Windows
on some of those in a controlled fashion.

== Source Step Mode Cleanup ==

I think that everyone wants to see this happen, but nobody has yet
stepped up to the challenge.

An interesting tangent came up here: moving the VC smarts from the
slave to the master.  There are several possible approaches here:
continue to use buildbot on the slave side, with the master telling
the slave to run 'git this' and 'svn that'; or replace the PB
connection with a simpler, non-Python-specific protocol that could be
implemented with minimal slave-side requirements (e.g.,
http://github.com/djmitche/remsh/).  It would be possible to implement
the latter in a "transitional" fashion, so that remsh slaves could act
just like regular buildbot slaves, without running
buildbot/twisted/python on the slave side.

We briefly touched on the idea of "splitting" buildbot into a master
(requiring Jinja, Sqlite, etc.) and a slave (with minimal
requirements).

== Expanded Notification Framework ==

It would be nice to see the various status listeners use a common
filtering mechanism to decide which changes are important, and perhaps
even a common formatting mechanism to describe those changes.  Nice,
but nobody spoke up to work on this.

== Logging ==

Chris has already added a good deal of optimization to the
log-handling code, and while some were surprised and excited to find
out about this, I didn't hear any suggestions of new features to be
added here.

== Community Supplied Slaves ==

A few ideas were suggested to make it easier for a buildmaster admin
to manage a set of community-supplied slaves (see above).  Nobody
spoke up to work on these ideas, though.  As I understand it, most of
the organizations using Buildbot supply their own slaves.

== Tests ==

I failed utterly to mention this during the meeting, but Buildbot's
tests are lacking, and it's my fault.  So I'll fix those up.

= Releases =

Going into this meeting, I hoped to get a small scope set for 0.8.1,
and then make the release when those features were done.  I was
convinced, instead, to plan a release when each feature gets merged.
John O'Duinn referred to this as a "short cadence" - a phrase I like!
The list of adopted projects is, roughly and with no authority:

 * master shutdown - mozilla
 * build database - mozilla
 * build coordination - zmanda
 * multi-project - zmanda
 * windows - ?? + mozilla slaves
 * tests - dustin

0.8.1 will be released when any one of these (or anything else,
really) is completed.

Dustin

-- 
Open Source Storage Engineer
http://www.zmanda.com




More information about the devel mailing list