[Buildbot-devel] Python wedges when buildslave connects
Dustin J. Mitchell
dustin at v.igoro.us
Thu Jul 14 07:24:59 UTC 2011
On Wed, Jul 13, 2011 at 7:13 PM, Matthew Morse <matt at apple.com> wrote:
> We have a server that runs several buildbot masters for our test fleet. I recently updated the python installation to 2.7.2 and the buildbot software to 0.8.4p1. I upgraded all the buildmasters, and all except for one behave as expected. The problematic one starts up properly and I can access its web page, but as soon as the buildslave (on another machine) connects with the buildmaster, the Python process starts to churn, using over 50% of the CPU. The webpage becomes unresponsive--although the build on the slave seems to proceed properly. From the sample, it seems that Python is getting stuck in a call to PyEval_EvalFrameEx.
Actually, this is just a normal C stack for a reasonably-deeply-nested
Python program - that function is basically how python executes
Python. So we'll need to do some more digging to figure out what's
wrong.
> I've tried numerous things to try to isolate the problem--recreated the buildmaster from scratch, changed the port number used by the slave, and so on. I've compared its configuration with other buildmasters that are working properly, and I don't see how this particular buildmaster is significantly different from the others. In fact, it is one of the simplest we have. The log shows a normal start-up, and then nothing more.
Ah - this may be the same bug that Derek and Pradeep encountered -
http://trac.buildbot.net/ticket/1992 Although it may be different -
1992 is a deadlock (that appears to be a BSD kernel bug?), while you
seem to have a livelock.
Can you use dtruss to figure out what syscalls the Python process is
making when it's churning? Also, please file a new bug to track work
on this and identify anyone else seeing the same problems.
Dustin
More information about the devel
mailing list