[Buildbot-devel] RFC: Crowd-sourcing CI using volunteer computing

Tue Mar 12 01:55:38 UTC 2013

This is an awesome idea, and one that I would *love* to see
implemented.  I think that this is an untapped opportunity for OSS CI,
and that Buildbot's in a good spot to take it.  That said, it's
complicated!  So please don't take any of the below as saying "can't
be done" or "don't bother".  Think of it as challenges I see.  It took
me some time to digest the proposal.

The proposal to enumerate build steps into a sequence that can be
executed apart from the master precludes any kind of logic on the
master.  For example, the master can't use properties, doStepIf, or
contain custom steps.  Master-centric logic is pretty fundamental to
Buildbot's operation (more so now that we've moved slave steps to the
master).  From what I can tell, you propose this to gives slaves
better control over what they do and when they do it, and to de-couple
slaves from tight network connections to the master.  The former
problem is probably relatively minor, with the common cases covered by
the current RETRY logic.  If a slave disconnects in the middle of a
job, just re-queue that job.  Both cases are best handled in a
separate, existing project to redefine the master/slave protocol,
including a means for a slave to say "hey, wait, I'm not read to run
another step yet".  This new protocol will also allow slaves to
continue in the absence of a connection to the master -- at least
until they finish a step.  I think that the protocol project is almost
orthogonal to your proposal.

Embedded here is the idea of "push" slaves, a la Tinderbox.  The way
other tools (I'm thinking of CC and Jenkins in particular) solve this
is to essentially create a build without any actions, and just record
the status the master receives in that build.  Buildbot could do that,
too, but again I think that's orthogonal to your proposal.

For all but the simplest applications I think that defining slave
capabilities generically is going to be impossible.  Rather, my first
thought is that this is 2013 and we can just rely on virtualization
and cloud for that purpose: define some shared AMIs, Vagrantfiles,
etc. for common requirements (Python app, Perl lib, etc.), with the
ability for project owners to define other AMIs specific to their
needs.  For example, back when I was building Amanda, it had a pretty
peculiar set of requirements, but I could easily have packaged those
into an AMI and shipped it.  As a fallback, we might have a means for
a project owner to publish a spec and tag it with a uuid, then have
users who believe they have implemented that spec flag their slave
with that uuid.  The project owner could define any verification steps
as part of the build itself.

User security is pretty important, too.  I'm not sure what kind of
vetting of projects BOINC projects go through, but I wouldn't run
those on any host that had any of my secrets (like my password) on it.
 Virtualization is a big help here, too.

Most projects that would use a service like this aren't starved for
slave resources.  Rather, they'd like to have 20 slaves at once a few
times per day or week.  So slaves should be able to do work for other
projects in the interim.

An important part of this project would be user convenience.  BOINC is
super-easy: install and go.  Donating time to your favorite project
should be similar, and that will probably require some sort of central
registry where masters and slaves can connect to one another.  I think
some creative use of crypto could help here: slaves get configured
with the public keys of projects they can build, and masters are
configured with private keys.  Then no hacking of the central registry
can be used to inject completely foreign code into a slave.

Let's keep this conversation going -- I'd love to see this come to pass.

Dustin