<html>
<head>
<meta content="text/html; charset=utf-8" http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
<br>
Hey Pierre, hey David,<br>
<br>
thanks for your input.<br>
<br>
As to why using SLURM:<br>
<br>
We are having an existing cluster for our research group (we are
developing codes for numerical simulation), that already uses SLURM.
It is best to integrate our testing hardware into that cluster to
have a hybrid usage of the hardware for both testing and production.<br>
<br>
Furthermore, we want to do performance testing (and in particular
scalability testing) in the future. To do that, we need to be able
to have quite some control over what cores our processes run on
(because for example, the performance of a memory bandwidth limited
code depends heavily on the available memory controllers). SLURM
offers ways to tackle this kind of things, while buildbot does not
(to my knowledge).<br>
<br>
@David:<br>
Unfortunately, we cannot use such an approach. The Compile step
definitely cannot be done on the head node, as it is usually
consuming as much or even more ressources than the execution step.
This brings us back to the problem of unifying buildsteps into one
slurm job.<br>
<br>
@Pierre:<br>
Actually, I think I have figured it out by now:<br>
<br>
I will write a SlurmLatentBuildSlave, that submits an sbatch file to
SLURM (maybe through a small server running on the head node). The
process in that sbatch file will (once appointed the ressources)
spin up the build slave.<br>
<br>
In a second step, I would write a DockerSlurmLatentBuildSlave, that
in the sbatch file spins up a docker container with the build slave
running inside.<br>
<br>
It seems clear and easy now, but I was really confused last week.
Thanks for your answers, they helped me clearing things up.<br>
<br>
Best,<br>
Dominic<br>
<br>
<br>
<div class="moz-cite-prefix">On 14.11.2015 22:27, David Strubbe
wrote:<br>
</div>
<blockquote
cite="mid:CAENVOuWe1j1NVHwL5UiNwYUE33GeRpP=E_nUhkKAu-pwnh0TFg@mail.gmail.com"
type="cite">
<div dir="ltr">Hi, I am running buildbot with SLURM jobs too. For
example, <a moz-do-not-send="true"
href="http://www.tddft.org/programs/octopus/buildbot">http://www.tddft.org/programs/octopus/buildbot</a>
(specifically the ones called hbar). But we only submit jobs for
the test step, the compilation is run on the head node. You may
find this script I wrote helpful:
<div><br>
</div>
<div><a moz-do-not-send="true"
href="http://web.mit.edu/%7Edstrubbe/www/queue_monitor.pl">http://web.mit.edu/~dstrubbe/www/queue_monitor.pl</a><br>
</div>
<div><br>
</div>
<div>It is BSD-licensed and manages submission of jobs with PBS
or SLURM. It is being used for the Octopus testsuite above, as
well as for another project, BerkeleyGW (BSD-licensed) from
which the attached script comes.
<div>
<div><br>
</div>
<div>David</div>
</div>
</div>
</div>
<div class="gmail_extra"><br>
<div class="gmail_quote">On Fri, Nov 13, 2015 at 8:45 AM,
Dominic Kempf <span dir="ltr"><<a moz-do-not-send="true"
href="mailto:dominic.kempf@iwr.uni-heidelberg.de"
target="_blank">dominic.kempf@iwr.uni-heidelberg.de</a>></span>
wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">Dear
Buildbot list,<br>
<br>
I am currently working on a buildbot setup that wants to run
buildslaves<br>
integrated into a small cluster that is using a SLURM
scheduling<br>
system. I have trouble mapping my requirements to buildbot
concepts in<br>
a suitable way.<br>
<br>
Problems arise from:<br>
* At first, I thought I can have just one buildslave on the
cluster frontend,<br>
that passes all build requests to a queue. But it seems
that I rather need<br>
one such slave on the frontend per job in the queue
(sounds like a job for<br>
a latent slave). Correct?<br>
* I have no clue yet on how to handle separate build steps,
because either<br>
- the job as submitted to SLURM must contain all build
steps at<br>
once - which makes a separation of logs etc. a pain<br>
- every build step must be submitted to SLURM separately,
with the jobs<br>
depending on each other correctly - which is also a
pain, because I cannot<br>
guarantee things running on the same node.<br>
<br>
To further complicate things, I also want to run my builds
in docker containers<br>
that we use to model heterogeneous userlands. Note that in
the above context, this<br>
is different than for example in a DockerLatentBuildSlave:
With the latter, the<br>
slave runs and builds its commands inside a docker
container. In my approach, a<br>
(potenitally also dockerized) buildslave submits a job to a
queue, which, when executed<br>
on some node, spins up another docker container there and
runs the job inside that<br>
one.<br>
<br>
I am open to any sort of input and discussion!<br>
Thanks in advance,<br>
<br>
Dominic Kempf<br>
_______________________________________________<br>
users mailing list<br>
<a moz-do-not-send="true" href="mailto:users@buildbot.net"
target="_blank">users@buildbot.net</a><br>
<a moz-do-not-send="true"
href="https://lists.buildbot.net/mailman/listinfo/users"
rel="noreferrer" target="_blank">https://lists.buildbot.net/mailman/listinfo/users</a><br>
</blockquote>
</div>
<br>
</div>
</blockquote>
<br>
</body>
</html>