[Buildbot-commits] [Buildbot] #2055: buildmaster spins when build starts
Buildbot
nobody at buildbot.net
Mon Jul 18 19:57:19 UTC 2011
#2055: buildmaster spins when build starts
--------------------+-----------------------
Reporter: mmorse | Owner:
Type: defect | Status: new
Priority: major | Milestone: undecided
Version: 0.8.4p2 | Keywords:
--------------------+-----------------------
My buildmaster starts properly and I can access its web pages. The
buildslave attaches properly. But, when I force a build, the build
master's webpage stops responding and the CPU usage on the buildmaster
host goes to 70% and stays there. Here's the description in more detail--
an annotated log:
Start buildmaster:
{{{
2011-07-18 12:23:38-0700 [-] Log opened.
2011-07-18 12:23:38-0700 [-] twistd 11.0.0
(/Library/Frameworks/Python.framework/Versions/2.7/Resources/Python.app/Contents/MacOS/Python
2.7.2) starting up.
2011-07-18 12:23:38-0700 [-] reactor class:
twisted.internet.selectreactor.SelectReactor.
2011-07-18 12:23:38-0700 [-] Applying patch for
http://twistedmatrix.com/trac/ticket/5079
2011-07-18 12:23:38-0700 [-] Creating BuildMaster -- buildbot.version:
0.8.4p2
2011-07-18 12:23:38-0700 [-] loading configuration from
/Users/buildmaster/BuildMasters/Bookshelf/master.cfg
2011-07-18 12:23:38-0700 [-] configuration update started
2011-07-18 12:23:38-0700 [-] unable to import dnotify, so Maildir will use
polling instead
2011-07-18 12:23:38-0700 [-] WARNING: the name 'Scheduler' is deprecated;
use SingleBranchScheduler instead
2011-07-18 12:23:38-0700 [-] WARNING: the name 'Scheduler' is deprecated;
use SingleBranchScheduler instead
2011-07-18 12:23:39-0700 [-] applying SQLite workaround from Buildbot bug
#1810
2011-07-18 12:23:39-0700 [-] twisted.spread.pb.PBServerFactory starting on
9987
2011-07-18 12:23:39-0700 [-] Starting factory
<twisted.spread.pb.PBServerFactory instance at 0x10290af38>
2011-07-18 12:23:39-0700 [-] adding new builder Lagos-Setup-Lion for
category None
2011-07-18 12:23:39-0700 [-] trying to load status pickle from
/Users/buildmaster/BuildMasters/Bookshelf/Lagos-Setup-Lion/builder
2011-07-18 12:23:39-0700 [-] no saved status pickle, creating a new one
2011-07-18 12:23:39-0700 [-] added builder Lagos-Setup-Lion in category
None
2011-07-18 12:23:39-0700 [-] adding new builder Bookshelf-Lion for
category None
2011-07-18 12:23:39-0700 [-] trying to load status pickle from
/Users/buildmaster/BuildMasters/Bookshelf/Bookshelf-Lion/builder
2011-07-18 12:23:39-0700 [-] no saved status pickle, creating a new one
2011-07-18 12:23:39-0700 [-] added builder Bookshelf-Lion in category None
2011-07-18 12:23:39-0700 [-] setBuilders._add:
[<buildbot.process.botmaster.BuildRequestDistributor instance at
0x10221d6c8>, <BuildSlave 'BookshelfTester-Lion', current builders: >]
['Lagos-Setup-Lion', 'Bookshelf-Lion']
2011-07-18 12:23:39-0700 [-] adding IStatusReceiver <WebStatus on port
tcp:8013 at 0x10237c200>
2011-07-18 12:23:39-0700 [-] buildbot.status.web.baseweb.RotateLogSite
starting on 8013
2011-07-18 12:23:39-0700 [-] Starting factory
<buildbot.status.web.baseweb.RotateLogSite instance at 0x102915cf8>
2011-07-18 12:23:39-0700 [-] Setting up http.log rotating 10 files of
1000000 bytes each
2011-07-18 12:23:39-0700 [-] WebStatus using
(/Users/buildmaster/BuildMasters/Bookshelf/public_html)
2011-07-18 12:23:39-0700 [-] adding IStatusReceiver
<buildbot.status.mail.MailNotifier instance at 0x1028fb878>
2011-07-18 12:23:39-0700 [-] removing 0 old schedulers, updating 0, and
adding 3
2011-07-18 12:23:39-0700 [-] adding 1 new changesources, removing 0
2011-07-18 12:23:39-0700 [-] configuration update complete
}}}
Slave attaches:
{{{
2011-07-18 12:24:44-0700 [Broker,0,17.226.12.156] slave 'BookshelfTester-
Lion' attaching from IPv4Address(TCP, '17.226.12.156', 60215)
2011-07-18 12:24:44-0700 [Broker,0,17.226.12.156] Starting buildslave
keepalive timer for 'BookshelfTester-Lion'
2011-07-18 12:24:44-0700 [Broker,0,17.226.12.156] Got slaveinfo from
'BookshelfTester-Lion'
2011-07-18 12:24:44-0700 [Broker,0,17.226.12.156] bot attached
2011-07-18 12:24:44-0700 [Broker,0,17.226.12.156] Buildslave
BookshelfTester-Lion attached to Lagos-Setup-Lion
2011-07-18 12:24:44-0700 [Broker,0,17.226.12.156] Buildslave
BookshelfTester-Lion attached to Bookshelf-Lion
}}}
I force a build:
{{{
2011-07-18 12:26:28-0700 [HTTPChannel,1,17.226.15.231] web forcebuild of
builder 'Lagos-Setup-Lion', branch='', revision='', repository='',
project='' by user 'mmorse '
2011-07-18 12:26:28-0700 [-] added buildset 2 to database
2011-07-18 12:26:28-0700 [-] starting build <Build Lagos-Setup-Lion> using
slave <SlaveBuilder builder='Lagos-Setup-Lion' slave='BookshelfTester-
Lion'>
2011-07-18 12:26:28-0700 [-] acquireLocks(slave <BuildSlave
'BookshelfTester-Lion', current builders: Lagos-Setup-Lion,Bookshelf-
Lion>, locks [])
2011-07-18 12:26:28-0700 [-] starting build <Build Lagos-Setup-Lion>..
pinging the slave <SlaveBuilder builder='Lagos-Setup-Lion' slave
='BookshelfTester-Lion'>
2011-07-18 12:26:28-0700 [-] sending ping
2011-07-18 12:26:28-0700 [Broker,0,17.226.12.156] ping finished: success
2011-07-18 12:26:28-0700 [-] <Build Lagos-Setup-Lion>.startBuild
2011-07-18 12:26:28-0700 [-] ShellCommand.startCommand(cmd=<RemoteCommand
'git' at 4338127432>)
2011-07-18 12:26:28-0700 [-] cmd.args = {'ignore_ignores': None,
'retry': None, 'branch': 'master', 'reference': None, 'submodules': False,
'shallow': False, 'patch': None, 'repourl': 'ssh://devpubs-
bot at git.apple.com/git/DevPubs/Lagos/lagos-setup', 'workdir': 'build',
'mode': 'clobber', 'timeout': 1200, 'progress': False, 'revision': None}
2011-07-18 12:26:28-0700 [-] Warning: Overwriting old serialized Build at
/Users/buildmaster/BuildMasters/Bookshelf/Lagos-Setup-Lion/0-log-git-stdio
2011-07-18 12:26:28-0700 [-] <RemoteCommand 'git' at 4338127432>:
RemoteCommand.run [0]
2011-07-18 12:26:28-0700 [-] LoggedRemoteCommand.start
2011-07-18 12:26:37-0700 [Broker,0,17.226.12.156] <RemoteCommand 'git' at
4338127432> rc=0
2011-07-18 12:26:37-0700 [-] closing log <buildbot.status.logfile.LogFile
instance at 0x10291c710>
}}}
-----
At this point, I notice:
- the buildmaster's CPU usage going to about 70% and staying there (I
killed the process after 10 minutes--the build should take considerably
less time than that).
- that I can no longer load the waterfall page for that buildmaster
(although other buildmasters running on that machine are still
responsive).
- the process that's running out of control is Python itself. I've
attached the sample (PythonSample.txt ) to this report.
Killing the slave doesn't cause the master to recover. If I 'make stop'
the buildmaster, the twistd.pid file is not removed, and the CPU usage is
still pegged. I have to 'kill -9 <pid>' to reset things.
--
Ticket URL: <http://trac.buildbot.net/ticket/2055>
Buildbot <http://buildbot.net/>
Buildbot: build/test automation
More information about the Commits
mailing list