[Buildbot-commits] [Buildbot] #624: Add Latent BuildSlave for DRMAA supporting systems
Buildbot trac
trac at buildbot.net
Mon Oct 21 16:36:25 UTC 2013
#624: Add Latent BuildSlave for DRMAA supporting systems
---------------------------+----------------------
Reporter: smackware | Owner:
Type: enhancement | Status: closed
Priority: major | Milestone: 0.8.+
Version: | Resolution: wontfix
Keywords: virtualization |
---------------------------+----------------------
Comment (by mvpel):
Yeah, let's reopen, why not?
I got the attached code working with just a small bit of change to fix the
lack of a delay and status-checking mechanism that caused the master to
not wait for the slave to be scheduled and dispatched before giving up on
it, and reporting that it failed to substantiate. I'll provide an updated
file later.
I also adapted the sgebuildslave.py into an htcondorbuildslave.py, though
my lack of familiarity with Python is tripping me up a bit - I need to
figure out how to pass arguments to the buildslave_setup_command, or set
environment variables, since I need to provide it with the slave name.
I've got an ugly little hack in there at the moment.
For the slave names, I'm using "LatentSlave01" through "LatentSlave16" (we
have several different builds), rather than host names (hence my need for
a setup-command argument), since a given latent slave could wind up
running on any of the exec hosts in the pool (we'll have 42 when
finished), and it's preferable to avoid having to update the slave list
every time an exec host is added or removed.
The slave is created fresh by the buildslave_setup_command script each
time a latent slave starts. The setup command runs "buildslave create-
slave" using the HTCondor-managed scratch directory, and then execs the
buildslave in there. HTCondor takes care of deleting that directory when
the job exits or is terminated. I also have a bit of code that creates the
info/host file so you can tell which exec host the slave wound up on.
I've noticed that when the slave terminates, it's marked as "removed" in
the HTCondor history. I'd prefer to have the slave shut itself down
gracefully rather than being killed off through the scheduler, so that
HTCondor will see it as "completed," rather than "removed."
I'm also trying to figure out if it's possible to have the slave do the
checkout and build in the buildslave's HTCondor scratch directory, and
then use the file transfer for anything that needs to go back to the
master. The catch is that the master won't know the name of that
directory, and in fact it won't be created at all until the slave starts
up, so the master-side checkouts from buildbot.steps.source.svn.SVN may
not play well. I'm not entirely clear on how the checkout mechanism works
yet.
--
Ticket URL: <http://trac.buildbot.net/ticket/624#comment:8>
Buildbot <http://buildbot.net/>
Buildbot: build/test automation
More information about the Commits
mailing list