[Buildbot-devel] Running build slaves on the sourceforge compile farm

Thu Aug 26 20:24:52 UTC 2004

On Thu, Aug 26, 2004 at 11:16:38AM -0700, Brian Warner wrote:
> Wow. That environment is even more challenging than I thought.

It's a lot better than it was - they didn't used to allow direct ssh, so
you had to go via a hard-to-automate menu.  I had to rely on cronjobs and
mail forwarding back then.

> You're correct
> that the normal slave arrangement isn't going to work. If you had more disk
> space to work with, and the compile farm machines could talk to each other,
> I'd suggest running both the slaves and the master within the farm, then use
> proxy tricks to get the change notification in and the status information
> out.

I'd not thought of this.  I think you actually can talk *between*
machines within the farm actually, and the master could run on the shell
server.

Can you cascade masters (or combine their results)?  Or would this
scheme require a separate buildbot setup for other buildslaves?

Overall, the "proxy-slave" approach seems better anyway.  Less to
install and less to run on the compile farm machines.

> Yeah, go with the proxy-buildslave idea. You'll need to write BuildSteps
> which do things like 'ssh cf-shell.sf.net ssh ppc-osx1 make all', but
> fortunately the stdout/exit-code reporting should work exactly the same. The
> source checkout phase will be, uh, interesting, as the buildbot design
> doesn't really expect to have slaves related in that fashion.

It may be rather non-standard anyway.  What I currently do is have one
machine which runs a cronjob to bootstrap tarballs from CVS checkouts.
These tarballs are what the buildslaves start from.
The benefits are that I only need to install known good versions of all
the tools required to do this (autoconf, automake, libtool, bison,
doxygen, tex, and probably others) in one place, if the tree is too
broken to bootstrap we don't waste much effort on it, and the build time
for the slaves is reduced - especially good as they're a shared resource
and some are rather old and slow.  I can also run release tarballs
through the same process so the exact release version has been tested
widely.

> You'll have to create a special "source "Builder which does nothing
> but get the sources into the shared directory, then fire an Interlock
> which the real Builders are waiting upon. That will prevent the builds
> from running until the sources are available.

That makes sense.  I can rsync across ssh as I currently do.

> This isn't ideal, and will break when a second change comes along
> while the builds are running (to prevent the "source" builder from updating
> the sources before the real builders have finished compiling would require
> some kind of bidirectional interlock.

I don't think this is a show-stopper - it probably just means the odd
spurious fail.

> But if your builds tend to run faster than your changes take
> place, it should probably work well enough.

The change rate varies a lot, but I can probably enforce a minimum delay
between sending one source update and the next.  Or maybe a bit of sly
communication behind the back of buildbot.

> And as you pointed out, the proxy-buildslave approach will mean you don't
> have to install twisted (or even python) on the real buildslaves, just on the
> proxies. (and you'll probably run all the proxies on the same host as the
> buildmaster anyway).

I think most (maybe all) have python installed, but probably all at
different versions.

> If you get this running, please let me know: being able to use the multiple
> architectures of the sourceforge compile farm is just the sort of thing the
> BuildBot is made for.

I certainly will.  Hopefully other people will find it useful to be able
to do the same (but hopefully not too many, or the compile farm will die
under the load!)

> It's a pity that you have to go through such hoops. I'd
> love it if the farm were more buildbot-friendly: having python+twisted in
> /usr/lib/ on all machines and letting them make connections to some internal
> machine where you could run a buildmaster would probably be enough.

You could try approaching them about it, though I get the impression
that they're understaffed and overworked - it generally seems to take
nearly a week just to get a cf machine reset when it yp gets upset and
stops anyone connecting.

Cheers,
    Olly