[Buildbot-devel] interacting with a batch queueing system (LSF)

Thu Jan 27 17:46:06 UTC 2011

"Mark Richardson (Internal)" <mark.richardson at nag.co.uk> writes:
> one of the steps of a nightly library build is to submit a job to
> test the build result. As it is a parallel resource, the tests will
> be driven by a bsub command.
>
> Typically the command will return the shell control to a user and
> then the user can interrogate later to see if, and how, their job has
> completed. I have to see if this causes the slave to report it is
> finished.

It will.

> I wonder if there already is a method for a slave to poll a job queue
> and inform the master the testing has completed.

I'm not sure about bsub, but I use qsub all the time.

1) some qsub's have a "-sync y" option.  This causes the submission
command to not return until the job is complete.  Check your manpage,
this is the easy solution.

2) v=$(qsub thescript.pbs | awk -F. '{print $1}')
   while $(qstat | grep ${v}) ; do sleep 30; done

You'll have to adjust for your cluster.  The first line should put the
jobid into the `v' variable (all submission systems I use print out the
jobid when you submit the job script).

> My initial thought was to have it as part of the Factory Steps but
> then there may be other ways with trigger-able schedules.  I also
> discovered that one cannot have the same basedir for different
> builders. This is not too restrictive as the test should probably
> occur in a separate user space anyway.

It actually works fine with some clever '../other-basedir' settings as
part of your workdir= specifications.

-tom