[Buildbot-devel] RFC: Crowd-sourcing CI using volunteer computing

Fri Mar 8 16:41:50 UTC 2013

Introduction
------------

Buildbot's current operation model consists of a fixed set of remote
processes executing builds, controlled by a central master process. The
build logic is encoded as a set of build steps in the master process,
and then marshalled to the slave processes for remote execution. The
master process fully controls the build slaves. All slave activity is
triggered from the master, and progress is closely tracked during builds.

This model relies on a tight connection between master and slaves, which
means the slaves need to be fully authenticated and connected during
operation.

The current proposal describes a different model which enables
contributors to donate compute resources more dynamically. In this
"volunteer-computing" (http://en.wikipedia.org/wiki/Volunteer_computing)
scenario contributors retain more control over the resources they
dedicate to a project. Instead of relying on a central master process
pushing work to slave processes, a contributor's computer would pull
tasks from the master process, and send back results. Well-known
instances of this model are seti at home
(http://en.wikipedia.org/wiki/SETI@home) and folding at home
(http://en.wikipedia.org/wiki/Folding@home).
The goal of this proposal is to make this processing model accessible to
Open-Source projects using a buildbot instance for CI, which may help
increase notably test coverage by allowing more people to contribute
their computing resources.

Use-cases
---------

* As in current buildbot setups, a project admin defines a set of
builders, including a set of build steps. However, as the machines these
builders run aren't known in advance, the builders also contain enough
meta-information to allow contributors to validate whether their
machine(s) matches the requirements for any particular builder (hardware
type, available third-party software needed for the build, etc.)

* The project admin generates self-contained packages that include
everything a contributor needs to run the builder, including preliminary
platform tests, the actual build sequence, as well as a means to report
back any outcome (generated files, test results, etc.)

* Contributors download the above package and install and configure it.
During configuration certain tests are run to validate the platform. The
configuration also allows to limit the resources donated to this build.
This may be a one-of build, or a setup for a recurrent build.

* Upon completion of a build (which may happen entirely offline) or at
any later point in time, the accumulated results of the build are sent
back to the master process. This includes enough meta-data to identify
the platform and build configuration to allow the master to aggregate
the results in a suitable form (e.g. test report matrix).

* The build master process can now publish results contributed from many
different places in a suitable way (e.g.
http://www.boost.org/development/tests/release/developer/summary.html)

High-Level Design
-----------------

In the current model, a builder is a composite object consisting of
build steps, which run on both the master and the slave processes.
Commands are defined on the master, but are marshalled to and executed
on the slaves.
The build step implementations can be changed to locally record the
executed commands, generating a self-contained _build script_, which can
then be transferred to a different computer and run locally. This
retains the build master "frontend", but substitutes the execution model
to give execution control to the owners of the slave processes. (In
fact,  the term "slave" doesn't capture this new relationship between
the different processes any longer.)

Moving to a more slave-centric model reduces the need for slave
processes to fully authenticate with the master. In fact, there are only
two points at which some form of authentication is required:

1) the slave needs to be able to authenticate the package before running
a build. This can be easily achieved with some check-sum or public key.

2) the master may need to authenticate the uploaded results if it is to
post-process the data or even publish them in raw form.

Development
-----------

The above may be implemented in multiple self-contained tasks that add
functionality to the existing buildbot code without the need to change
existing logic. Here is a possible break-down of tasks that may benefit
the buildbot project individually:

1) Implement code (a BuildFactory subclass ?) that generates a
self-contained build script from a list of build steps.

2) Define a set of meta-data as well as some validation mechanism that
would be run to validate a given platform against a specific set of
(project-specific) meta-data.

3) Define a container layout to be used to send back (project-specific)
results together with enough meta-data to identify / authenticate the
platform and allow the master to aggregate and post-process the data.

4) Design a package layout suitable for contributors to download,
including an (interactive) installer, consisting of everything needed to
set up a one-time or recurrent builder. Ideally, such packages can be
generated automatically from a given master configuration.

The above may be achieved in multiple iterations. I expect a first
iteration to result in a functional prototype that can be implemented
within a GSoC project.

-- 

      ...ich hab' noch einen Koffer in Berlin...