[Buildbot-devel] Re: [Python-Dev] buildbot

Wed Jan 11 09:24:29 UTC 2006

> The reason I want static pages is for security concerns. It is not
> easy whether buildbot can be trusted to have no security flaws,
> which might allow people to start new processes on the master,
> or (perhaps worse) on any of the slaves.

These are excellent points. While it would take a complete security audit to
satisfy the kind of paranoia *I* tend to carry around, I can mention a couple
of points about the buildbot's security design that might help you make some
useful decisions about it:

 The buildmaster "owns" (spelled "pwns" or "0wnz0red" these days, according
 to my "leet-speak phrasebook" :) the buildslaves. It can make them run
 whatever shell commands it likes, therefore it has full control of the
 buildslave accounts. It is appropriate to give the buildslaves their own
 account, with limited privileges.

 The codebase "owns" the buildslaves too: most build commands will wind up
 running './configure' or 'make' or something which executes commands that
 are provided by the checked out source tree.

 Nobody is supposed to "own" the buildmaster: it performs build scheduling
 and status reporting according to its own design and configuration file. A
 compromised codebase cannot make the buildmaster do anything unusual, nor
 can a compromised buildslave. The worst that a buildslave can do is cause a
 DoS attack by sending overly large command-status messages, which can
 prevent the buildmaster from doing anything useful (in the worst case
 causing it to run out of memory), but cannot make it do anything it isn't
 supposed to.

 The top-level IRC functions can be audited by inspecting the command_
 methods, as you've already seen.

 The HTTP status page can be audited similarly, once you know how Twisted-Web
 works (there is a hierarchy of Resource objects, each component of the URL
 path uses Resource.getChild to obtain the child node in this tree, once the
 final child is retrieved then the 'render' method is called to produce
 HTML). The Waterfall resource and all its children get their capabilities
 from two objects: Status (which provides read-only status information about
 all builds) and Control (which is the piece that allows things like "force
 build"). The knob that disables the "Force Build" button does so by creating
 the Waterfall instance with control=None. If you can verify that the code
 doesn't go out of its way to acquire a Control reference through some
 private-use-only attribute, then you can be reasonably confident that it
 isn't possible to make the web server do anything to trigger a build. It's
 not restricted-execution mode or anything, but it's designed with
 capability-based security in mind, and that may help someone who wishes to
 audit it.

 The PBListener status interface is similar: PB guarantees that only remote_*
 methods can be invoked by a remote client, and the PBListener object only
 has a reference to the top-level Status object.

 The slave->master connection (via the 'slaveport') uses PB, so it can be
 audited the same way. Only the remote_* (and perspective_*) methods of
 objects which are provided to the buildslave can be invoked. The buildslaves
 are allowed to do two things to the top-level buildmaster: force a build
 that is run on their own machine, and invoke an empty 'keepalive' method.
 During a build, they can send remote_update and remote_complete messages to
 the current BuildStep: this is how they deliver status information
 (specifically the output of shell commands). By inspecting
 buildbot.process.step.RemoteCommand.remote_update, you can verify that the
 update is appended to a logfile and nothing else.

 PB's serialization is designed specifically to be safe (in explicit contrast
 to pickle). Arbitrary classes cannot be sent over the wire. The worst-case
 attack is DoS, specifically memory exhaustion.

Any application which can talk to the outside world is a security concern.
The tools that we have to insure that these applications only do what we
intended them to do are not as fully developed as we would like (I hang out
with the developers of E, and would love to implement the buildbot in a
capability-secure form of Python, but such a beast is not available right
now, and I'm spending too much time writing Buildbot code to get around to
writing a more secureable language too). So we write our programs in as clear
a way as possible, and take advantage of tools that have been developed or
inspected by people we respect.

These days my paranoia tells me to trust a webserver written in Python more
than one written in C. Buffer overruns are the obvious thing, but another
important distinction is how Twisted's web server architecture treats the URL
as a path of edges in a tree of Resource instances rather than as a pathname
to a file on the disk. I don't need to worry about what kind of URLs might
give access to the master.cfg file (which could contain debugging passwords
or something), as long as I can tell that none of the Resource instances give
access to it. This also makes preventing things like
http://foo/../../oops.txt much much easier.

Preferring a Twisted web server over Apache reveals my bias, both in favor of
Python and the developers of Twisted and Twisted's web server, and I quite
understand if you don't share that bias. I think it would be quite possible
to create a 'StaticWaterfall' status class, which would write HTML to a tree
of files each time something changed. There are a number of status event
delivery APIs in the buildbot which could cause a method to be called each
time a Step was started or finished, and these could just write new HTML to a
file. It would consume a bit more disk space, but would allow an external
webserver to provide truly read-only access to build status. If you'd like me
to spend some cycles on this, please let me know.. perhaps others would
prefer this style of status delivery too.

cheers,
 -Brian