[Buildbot-devel] suggestions on how to deal with batch queues and mpi
Chris Kees
cekees at gmail.com
Fri Mar 9 04:08:12 UTC 2012
Hi Mark and Albert,
Thanks a lot for laying it out for me. If I produce some python scripts
I'll make them available. The pexpect approaches looks pretty promising
but it looks as if some of the machines are making it harder to get an
interactive job so the method of polling the queue for running jobs is
probably going to be necessary for some machines.
Thanks,
Chris
On Wed, Mar 7, 2012 at 3:28 AM, Mark Richardson (Internal) <
mark.richardson at nag.co.uk> wrote:
> hi Chris,
> I am not sure about cool (as I am not confident enough with Python to add
> to buildbot).
> Anyway I wrote some Bourne shell scripts that effectively write the batch
> job script based on the supplied parameters for example:
>
> create_job.sh <project> <mode> <platform>
>
> I had to build in all the intelligence to distinguish the project,
> mode and platform (basically lots of case and test structures).
> They build up strings that correspond to lines of the job script
> I would have written manually. This is then echoed into a file that is
> subsequently submitted to the job queue.
>
> I invoke it with a call to that job script with a buildbot shell step.
> (workdir is the test directory)
>
> Oh do not forget to build in a method for picking up the job id and
> continue to poll the queue before exiting the build step (create_job)
> script.
>
> This last point is important as your build step will appear to finish
> successfully and quickly (!) meanwhile your test sequence is in a queue
> and could be in any state.
>
> I cannot supply the confidential scripts but I hope I have given you a
> workplan to design one (may be even a python version?).
>
> The other script in this toolset will launch several PBS/LSF jobs and wait
> for all to complete and report the state of each. Unfortunately it is not
> skilled in writing job scripts and relies on the developer to include some
> in their test directory :( .
>
> Also I am surprised that you need to use -I switch.
> (you could wait a long time on the Cray I use)
> I guess that is the platform from your aprun command.
>
> Good luck,
>
> Mark
>
>
> On 06/03/2012 23:18, Chris Kees wrote:
>
>> Hi,
>>
>> I have some buildslaves for which any tests have to be run through a
>> batch system, and they must be run via an mpi launcher inside. For
>> example, if I was logged in I would do something like
>>
>> login-node%qsub -I
>> (wait for interactive session to start on compute node of cluster)
>> compute-node-0% cd my_tests; aprun python -c "import nose; nose.run()"
>>
>> I'm wondering if anybody has any cool ways of dealing with this.
>> Unfortunately one of the systems will only take i/o from a terminal when
>> in interactive mode (the "-I" switch), so I haven't been able to wrap
>> something like subprocess.Popen around it. I was thinking I would write
>> a script that would inject the commands I want to run into a standard
>> batch script for a given system and submit it to the queue, but I'm not
>> sure how to get my script to wait until the job runs and then dump the
>> stdout/stderr back to the Shell processes.
>>
>> Thanks,
>> Chris
>>
>> ______________________________**______________________________**
>> ____________
>> This e-mail has been scanned for all viruses by Star.
>> ______________________________**______________________________**
>> ____________
>>
>>
>> ------------------------------**------------------------------**
>> ------------------
>> Keep Your Developer Skills Current with LearnDevNow!
>> The most comprehensive online learning library for Microsoft developers
>> is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
>> Metro Style Apps, more. Free future releases when you subscribe now!
>> http://p.sf.net/sfu/**learndevnow-d2d<http://p.sf.net/sfu/learndevnow-d2d>
>>
>>
>>
>>
>> ______________________________**_________________
>> Buildbot-devel mailing list
>> Buildbot-devel at lists.**sourceforge.net<Buildbot-devel at lists.sourceforge.net>
>> https://lists.sourceforge.net/**lists/listinfo/buildbot-devel<https://lists.sourceforge.net/lists/listinfo/buildbot-devel>
>>
>
> --
> Mark Richardson, Ph.D. HECToR CSE, Mobile: 07525 238037
> NAG Manchester, Peter House, Oxford Street, Manchester, M1 5AN
> Head office at:
> Numerical Algorithms Group Ltd, Wilkinson House,
> Jordan Hill Business park, Oxford OX2 8DR
> ------------------------------**------------------------------**----
>
>
>
>
> ______________________________**______________________________**
> ____________
> The Numerical Algorithms Group Ltd is a company registered in England
> and Wales with company number 1249803. The registered office is:
> Wilkinson House, Jordan Hill Road, Oxford OX2 8DR, United Kingdom.
>
> This e-mail has been scanned for all viruses by Star. The service is
> powered by MessageLabs. ______________________________**
> ______________________________**____________
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://buildbot.net/pipermail/devel/attachments/20120308/3458526e/attachment.html>
More information about the devel
mailing list