[Buildbot-commits] [Buildbot] #624: Add Latent BuildSlave for DRMAA supporting systems

Buildbot trac trac at buildbot.net
Mon Oct 21 16:36:25 UTC 2013


#624: Add Latent BuildSlave for DRMAA supporting systems
---------------------------+----------------------
Reporter:  smackware       |       Owner:
    Type:  enhancement     |      Status:  closed
Priority:  major           |   Milestone:  0.8.+
 Version:                  |  Resolution:  wontfix
Keywords:  virtualization  |
---------------------------+----------------------

Comment (by mvpel):

 Yeah, let's reopen, why not?

 I got the attached code working with just a small bit of change to fix the
 lack of a delay and status-checking mechanism that caused the master to
 not wait for the slave to be scheduled and dispatched before giving up on
 it, and reporting that it failed to substantiate. I'll provide an updated
 file later.

 I also adapted the sgebuildslave.py into an htcondorbuildslave.py, though
 my lack of familiarity with Python is tripping me up a bit - I need to
 figure out how to pass arguments to the buildslave_setup_command, or set
 environment variables, since I need to provide it with the slave name.
 I've got an ugly little hack in there at the moment.

 For the slave names, I'm using "LatentSlave01" through "LatentSlave16" (we
 have several different builds), rather than host names (hence my need for
 a setup-command argument), since a given latent slave could wind up
 running on any of the exec hosts in the pool (we'll have 42 when
 finished), and it's preferable to avoid having to update the slave list
 every time an exec host is added or removed.

 The slave is created fresh by the buildslave_setup_command script each
 time a latent slave starts. The setup command runs "buildslave create-
 slave" using the HTCondor-managed scratch directory, and then execs the
 buildslave in there. HTCondor takes care of deleting that directory when
 the job exits or is terminated. I also have a bit of code that creates the
 info/host file so you can tell which exec host the slave wound up on.

 I've noticed that when the slave terminates, it's marked as "removed" in
 the HTCondor history. I'd prefer to have the slave shut itself down
 gracefully rather than being killed off through the scheduler, so that
 HTCondor will see it as "completed," rather than "removed."

 I'm also trying to figure out if it's possible to have the slave do the
 checkout and build in the buildslave's HTCondor scratch directory, and
 then use the file transfer for anything that needs to go back to the
 master. The catch is that the master won't know the name of that
 directory, and in fact it won't be created at all until the slave starts
 up, so the master-side checkouts from buildbot.steps.source.svn.SVN may
 not play well. I'm not entirely clear on how the checkout mechanism works
 yet.

-- 
Ticket URL: <http://trac.buildbot.net/ticket/624#comment:8>
Buildbot <http://buildbot.net/>
Buildbot: build/test automation


More information about the Commits mailing list