[Buildbot-devel] Limitations using buildbot for building a huge amount of projects

Jean-Paul Calderone exarkun at divmod.com
Thu Apr 24 15:23:40 UTC 2008


On Thu, 24 Apr 2008 16:57:51 +0200, Iago Toral Quiroga <itoral at igalia.com> wrote:
>Hi Juan-Paul!
>
>I added John to the CC. He is been working on this too and will be
>interested.
>
>First, thanks for your comments, I really appreciate your help. I answer
>your questions below...
>
>El jue, 24-04-2008 a las 09:51 -0400, Jean-Paul Calderone escribió:
>> On Thu, 24 Apr 2008 13:33:41 +0200, Iago Toral Quiroga <itoral at igalia.com> wrote:
>> >Hi,
>> >
>> >we are using buildbot to provide continuous integration of the gnome
>> >platform (http://buildbot.gnome.org). Basically, we run several Build
>> >Master instances and provide an html frontend that renders the main page
>> >getting information from each master.
>>
>> Can you elaborate on why you have more than one master?
>>
>
>We need to build around 200 projects, and we want to do each of them in
>several machines too, having a waterfall view of the build results for
>each of the slave machines we have use to build the project. For
>example:
>
>http://build.gnome.org/libxml2/
>
>That's project libxml2 building on a Debian/Sid slave machine (first
>builder) and a Red Hat slave machine (second builder).
>
>To achieve this we have one master per project, and one builder per
>slave machine. Then, in each slave machine we have a Build Slave
>instance per project. This way we have a nice waterfall view of the
>build results of each project (each master provides its own, only for
>the project it is building). So, we just added a main page that queries
>the state of each master instance, providing a summary for all the
>projects.
>
>If there is a way to achieve this using a single master instance I'd be
>more than glad to know about that, it would make my life way easier :)

I think you may be able to configure a single master with a number of
different waterfall views (exposed at different URLs, eg the current
URLs).

One way to do this is with categories.  A single master can have many
buildbot.status.web.waterfall.WaterfallStatusResources, each at a
different URL and each with different configuration.  That resource
class accepts a `categories´ argument to its initializer.  It will
only include builds from builders which have a `category´ which is
in that list.

For example, here's a snippet from Twisted's buildmaster configuration:

winxp32py25iocp = scmikesWinXP32.copy()
winxp32py25iocp.update({
  'name': "winxp32-py2.5-iocp",
  'builddir': "WXP32-full2.5-iocp",
  'factory': TwistedReactorsBuildFactory(blah blah blah),
  'category': 'unsupported'})
builders.append(winxp32py25iocp)

...

status.putChild(
    "waterfall",
    WaterfallStatusResource(categories=['supported', 'unsupported']))

This gives us a waterfall with all our builders.  We have another one
that only includes "supported".  So, you could create a category for
each project and then a waterfall for each category.  This done, you
should only need one slave process per slave machine, and they can
all talk to the same master.

> [snip]
>
>> >2.- We would like the solution to be scalable, being able to add more
>> >slaves. Unfortunately, adding more slaves implies to multiply the number
>> >of required connections by the number of slaves, which does not seem to
>> >be a scalable solution.
>> >
>> >A mate of mine, John Carr, has tried to address the first problem by
>> >using a Socks proxy, so only the socks server proxy port would be
>> >public. Unfortunately this makes the second problem even worse, for we
>> >need twice as many connections (slave<->proxy and proxy<->master).
>>
>>
>> Twisted easily scales to many thousands of connections.  Are you worried
>> about the ability of the masters to handle all the connections?  The
>> ability of the firewall to?  Something else?
>
>We are getting this error message in twistd.log:
>
>2008/04/24 16:20 +0200 [twisted.spread.pb.PBServerFactory] Could not
>accept new connection (EMFILE)

Hopefully, this will be moot.  However, you can fix it by raising the
"open files" ulimit for the buildmaster and switching to the poll or
epoll reactor (I'm actually not sure how you specify a reactor to use
for buildbot - it doesn't seem to be an option to `buildbot´; maybe the
thing to do is edit `Makefile.buildbot´ (that still gets used, right?)
and add `--reactor epoll´ to the twistd command in the `start´ target).

>> >He also suggested to implement a custom ITransport so that masters and
>> >slaves would not open new TCP connections but use a shared one instead,
>> >which in principle would be good solution to both problems.
>>
>> One approach here would be to use SSH as the transport and use a channel
>> for each logical connection.
>
>That could be a way to go... depending on the amount of channels that
>SSH transport allows. Otherwise we would be running into the same
>problem again I guess.

I think the maximum number of channels is something like 2 ** 16 - 1, so
it would probably work, but again, hopefully moot.

Jean-Paul




More information about the devel mailing list