[Buildbot-devel] Python wedges when buildslave connects

Dustin J. Mitchell dustin at v.igoro.us
Thu Jul 14 07:24:59 UTC 2011


On Wed, Jul 13, 2011 at 7:13 PM, Matthew Morse <matt at apple.com> wrote:
> We have a server that runs several buildbot masters for our test fleet. I recently updated the python installation to 2.7.2 and the buildbot software to 0.8.4p1.  I upgraded all the buildmasters, and all except for one behave as expected. The problematic one starts up properly and I can access its web page, but as soon as the buildslave (on another machine) connects with the buildmaster, the Python process starts to churn, using over 50% of the CPU. The webpage becomes unresponsive--although the build on the slave seems to proceed properly.  From the sample, it seems that Python is getting stuck in a call to PyEval_EvalFrameEx.

Actually, this is just a normal C stack for a reasonably-deeply-nested
Python program - that function is basically how python executes
Python.  So we'll need to do some more digging to figure out what's
wrong.

> I've tried numerous things to try to isolate the problem--recreated the buildmaster from scratch, changed the port number used by the slave, and so on. I've compared its configuration with other buildmasters that are working properly, and I don't see how this particular buildmaster is significantly different from the others. In fact, it is one of the simplest we have. The log shows a normal start-up, and then nothing more.

Ah - this may be the same bug that Derek and Pradeep encountered -
http://trac.buildbot.net/ticket/1992  Although it may be different -
1992 is a deadlock (that appears to be a BSD kernel bug?), while you
seem to have a livelock.

Can you use dtruss to figure out what syscalls the Python process is
making when it's churning?  Also, please file a new bug to track work
on this and identify anyone else seeing the same problems.

Dustin




More information about the devel mailing list