[Buildbot-devel] Prevent running multiple things on the same machine
Brian Warner
warner-buildbot at lothar.com
Wed Jul 6 21:04:53 UTC 2005
> I didn't realise that the same slave would run multiple builders
> simultaneously. Is there a way to prevent that? It seems like the
> interlocks are being used as dependencies rather than a mutex, which
> is nice, but there is now no mutex functionality. Would I create a
> mutual dependency or something?
[brian catches up on mail, sorry for the overlap with other messages]
Correct, the 0.6.6 "Interlock" is a poorly-thought-out combination of
dependency and mutex, which is why they're being split. For 0.6.6, the only
solution (and it isn't a very good one) is to use an Interlock and change the
"upstream"/"feeder" build to always succeed (by clearing the failOnFailure
flag from all the steps). That way the "downstream" builder will always run,
but will run *after* the upstream builder finishes. You could also hack the
Interlock class to ignore failure of the upstream build.
The next release's Locks will solve this more cleanly. I'll attach the
section of the user's manual where I describe how to use Locks.. could you
take a look at it and tell me what you think?
thanks,
-Brian
6.2 Interlocks
==============
For various reasons, you may want to prevent certain Steps (or perhaps
entire Builds) from running simultaneously. Limited CPU speed or
network bandwidth to the VC server, problems with simultaneous access
to a database server used by unit tests, or multiple Builds which
access shared state may all require some kind of interlock to prevent
corruption, confusion, or resource overload.
`Locks' are the mechanism used to express these kinds of
constraints on when Builds or Steps can be run. There are two kinds of
`Locks', each with their own scope: `SlaveLock's are scoped to a
single buildslave, while `MasterLock' instances are scoped to the
buildbot as a whole. Each `Lock' is created with a unique name.
To use a lock, simply include it in the `locks=' argument of the
`BuildStep' or `Build' object that should obtain the lock before it
runs. These arguments accept a list of `Lock' objects: the Step or
Build will acquire all of them before it runs. The `BuildFactory'
also accepts `locks=', and simply passes it on to the `Build' that it
creates.
Note that there are no partial-acquire or partial-release
semantics: this prevents deadlocks caused by two Steps each waiting
for a lock held by the other. This also means that waiting to acquire
a `Lock' can take an arbitrarily long time: if the buildmaster is
very busy, a Step or Build which requires only one `Lock' may starve
another that is waiting for that `Lock' plus some others. (1)
In the following example, we run the same build on three different
platforms. The unit-test steps of these builds all use a common
database server, and would interfere with each other if allowed to run
simultaneously. The `Lock' prevents more than one of these builds
from happening at the same time.
from buildbot import locks
from buildbot.process import s, step, factory
db_lock = locks.MasterLock("database")
steps = [s(step.SVN, svnurl="http://example.org/svn/Trunk"),
s(step.ShellCommand, command="make all"),
s(step.ShellCommand, command="make test", locks=[db_lock]),
]
f = factory.BuildFactory(steps)
b1 = {'name': 'full1', 'slavename': 'bot-1, builddir='f1', 'factory': f}
b2 = {'name': 'full2', 'slavename': 'bot-2, builddir='f2', 'factory': f}
b3 = {'name': 'full3', 'slavename': 'bot-3, builddir='f3', 'factory': f}
c['builders'] = [b1, b2, b3]
In the next example, we have one buildslave hosting three separate
Builders (each running tests against a different version of Python).
The machine which hosts this buildslave is not particularly fast, so
we want to prevent the builds from all happening at the same time. We
use a `SlaveLock' because the builds happening on the slow slave do
not affect builds running on other slaves, and we use the lock on the
build as a whole because the slave is so slow that even multiple SVN
checkouts would be taxing.
from buildbot import locks
from buildbot.process import s, step, factory
slow_lock = locks.SlaveLock("cpu")
source = s(step.SVN, svnurl="http://example.org/svn/Trunk")
f22 = factory.Trial(source, trialpython=["python2.2"], locks=[slow_lock])
f23 = factory.Trial(source, trialpython=["python2.3"], locks=[slow_lock])
f24 = factory.Trial(source, trialpython=["python2.4"], locks=[slow_lock])
b1 = {'name': 'p22', 'slavename': 'bot-1, builddir='p22', 'factory': f22}
b2 = {'name': 'p23', 'slavename': 'bot-1, builddir='p23', 'factory': f23}
b3 = {'name': 'p24', 'slavename': 'bot-1, builddir='p24', 'factory': f24}
c['builders'] = [b1, b2, b3]
In the last example, we use two Locks at the same time. In this
case, we're concerned about both of the previous constraints, but
we'll say that only the tests are computationally intensive, and that
they have been split into those which use the database and those
which do not. In addition, two of the Builds run on a fast machine
which does not need to worry about the cpu lock, but which still must
be prevented from simultaneous database access.
from buildbot import locks
from buildbot.process import s, step, factory
db_lock = locks.MasterLock("database")
cpu_lock = locks.SlaveLock("cpu")
slow_steps = [s(step.SVN, svnurl="http://example.org/svn/Trunk"),
s(step.ShellCommand, command="make all", locks=[cpu_lock]),
s(step.ShellCommand, command="make test", locks=[cpu_lock]),
s(step.ShellCommand, command="make db-test",
locks=[db_lock, cpu_lock]),
]
slow_f = factory.BuildFactory(slow_steps)
fast_steps = [s(step.SVN, svnurl="http://example.org/svn/Trunk"),
s(step.ShellCommand, command="make all", locks=[]),
s(step.ShellCommand, command="make test", locks=[]),
s(step.ShellCommand, command="make db-test",
locks=[db_lock]),
]
fast_factory = factory.BuildFactory(fast_steps)
b1 = {'name': 'full1', 'slavename': 'bot-slow, builddir='full1',
'factory': slow_factory}
b2 = {'name': 'full2', 'slavename': 'bot-slow, builddir='full2',
'factory': slow_factory}
b3 = {'name': 'full3', 'slavename': 'bot-fast, builddir='full3',
'factory': fast_factory}
b4 = {'name': 'full4', 'slavename': 'bot-fast, builddir='full4',
'factory': fast_factory}
c['builders'] = [b1, b2, b3, b4]
As a final note, remember that a unit test system which breaks when
multiple people run it at the same time is fragile and should be
fixed. Asking your human developers to serialize themselves when
running unit tests will just discourage them from running the unit
tests at all. Find a way to fix this: change the database tests to
create a new (uniquely-named) user or table for each test run, don't
use fixed listening TCP ports for network tests (instead listen on
port 0 to let the kernel choose a port for you and then query the
socket to find out what port was allocated). `MasterLock's can be
used to accomodate broken test systems like this, but are really
intended for other purposes: build processes that store or retrieve
products in shared directories, or which do things that human
developers would not (or which might slow down or break in ways that
require human attention to deal with).
---------- Footnotes ----------
(1) Also note that a clever buildmaster admin could still create
the opportunity for deadlock: Build A obtains Lock 1, inside which
Step A.two tries to acquire Lock 2 at the Step level. Meanwhile Build
B obtains Lock 2, and has a Step B.two which wants to acquire Lock 1
at the Step level. Don't Do That.
More information about the devel
mailing list