[Buildbot-commits] [Buildbot] #2055: buildmaster spins when build starts

Buildbot nobody at buildbot.net
Mon Jul 18 19:57:19 UTC 2011


#2055: buildmaster spins when build starts
--------------------+-----------------------
Reporter:  mmorse   |      Owner:
    Type:  defect   |     Status:  new
Priority:  major    |  Milestone:  undecided
 Version:  0.8.4p2  |   Keywords:
--------------------+-----------------------
 My buildmaster starts properly and I can access its web pages. The
 buildslave attaches properly. But, when I force a build, the build
 master's webpage stops responding and the CPU usage on the buildmaster
 host goes to 70% and stays there. Here's the description in more detail--
 an annotated log:



 Start buildmaster:
 {{{
 2011-07-18 12:23:38-0700 [-] Log opened.
 2011-07-18 12:23:38-0700 [-] twistd 11.0.0
 (/Library/Frameworks/Python.framework/Versions/2.7/Resources/Python.app/Contents/MacOS/Python
 2.7.2) starting up.
 2011-07-18 12:23:38-0700 [-] reactor class:
 twisted.internet.selectreactor.SelectReactor.
 2011-07-18 12:23:38-0700 [-] Applying patch for
 http://twistedmatrix.com/trac/ticket/5079
 2011-07-18 12:23:38-0700 [-] Creating BuildMaster -- buildbot.version:
 0.8.4p2
 2011-07-18 12:23:38-0700 [-] loading configuration from
 /Users/buildmaster/BuildMasters/Bookshelf/master.cfg
 2011-07-18 12:23:38-0700 [-] configuration update started
 2011-07-18 12:23:38-0700 [-] unable to import dnotify, so Maildir will use
 polling instead
 2011-07-18 12:23:38-0700 [-] WARNING: the name 'Scheduler' is deprecated;
 use SingleBranchScheduler instead
 2011-07-18 12:23:38-0700 [-] WARNING: the name 'Scheduler' is deprecated;
 use SingleBranchScheduler instead
 2011-07-18 12:23:39-0700 [-] applying SQLite workaround from Buildbot bug
 #1810
 2011-07-18 12:23:39-0700 [-] twisted.spread.pb.PBServerFactory starting on
 9987
 2011-07-18 12:23:39-0700 [-] Starting factory
 <twisted.spread.pb.PBServerFactory instance at 0x10290af38>
 2011-07-18 12:23:39-0700 [-] adding new builder Lagos-Setup-Lion for
 category None
 2011-07-18 12:23:39-0700 [-] trying to load status pickle from
 /Users/buildmaster/BuildMasters/Bookshelf/Lagos-Setup-Lion/builder
 2011-07-18 12:23:39-0700 [-] no saved status pickle, creating a new one
 2011-07-18 12:23:39-0700 [-] added builder Lagos-Setup-Lion in category
 None
 2011-07-18 12:23:39-0700 [-] adding new builder Bookshelf-Lion for
 category None
 2011-07-18 12:23:39-0700 [-] trying to load status pickle from
 /Users/buildmaster/BuildMasters/Bookshelf/Bookshelf-Lion/builder
 2011-07-18 12:23:39-0700 [-] no saved status pickle, creating a new one
 2011-07-18 12:23:39-0700 [-] added builder Bookshelf-Lion in category None
 2011-07-18 12:23:39-0700 [-] setBuilders._add:
 [<buildbot.process.botmaster.BuildRequestDistributor instance at
 0x10221d6c8>, <BuildSlave 'BookshelfTester-Lion', current builders: >]
 ['Lagos-Setup-Lion', 'Bookshelf-Lion']
 2011-07-18 12:23:39-0700 [-] adding IStatusReceiver <WebStatus on port
 tcp:8013 at 0x10237c200>
 2011-07-18 12:23:39-0700 [-] buildbot.status.web.baseweb.RotateLogSite
 starting on 8013
 2011-07-18 12:23:39-0700 [-] Starting factory
 <buildbot.status.web.baseweb.RotateLogSite instance at 0x102915cf8>
 2011-07-18 12:23:39-0700 [-] Setting up http.log rotating 10 files of
 1000000 bytes each
 2011-07-18 12:23:39-0700 [-] WebStatus using
 (/Users/buildmaster/BuildMasters/Bookshelf/public_html)
 2011-07-18 12:23:39-0700 [-] adding IStatusReceiver
 <buildbot.status.mail.MailNotifier instance at 0x1028fb878>
 2011-07-18 12:23:39-0700 [-] removing 0 old schedulers, updating 0, and
 adding 3
 2011-07-18 12:23:39-0700 [-] adding 1 new changesources, removing 0
 2011-07-18 12:23:39-0700 [-] configuration update complete
 }}}

 Slave attaches:
 {{{

 2011-07-18 12:24:44-0700 [Broker,0,17.226.12.156] slave 'BookshelfTester-
 Lion' attaching from IPv4Address(TCP, '17.226.12.156', 60215)
 2011-07-18 12:24:44-0700 [Broker,0,17.226.12.156] Starting buildslave
 keepalive timer for 'BookshelfTester-Lion'
 2011-07-18 12:24:44-0700 [Broker,0,17.226.12.156] Got slaveinfo from
 'BookshelfTester-Lion'
 2011-07-18 12:24:44-0700 [Broker,0,17.226.12.156] bot attached
 2011-07-18 12:24:44-0700 [Broker,0,17.226.12.156] Buildslave
 BookshelfTester-Lion attached to Lagos-Setup-Lion
 2011-07-18 12:24:44-0700 [Broker,0,17.226.12.156] Buildslave
 BookshelfTester-Lion attached to Bookshelf-Lion
 }}}

 I force a build:
 {{{

 2011-07-18 12:26:28-0700 [HTTPChannel,1,17.226.15.231] web forcebuild of
 builder 'Lagos-Setup-Lion', branch='', revision='', repository='',
 project='' by user 'mmorse '
 2011-07-18 12:26:28-0700 [-] added buildset 2 to database
 2011-07-18 12:26:28-0700 [-] starting build <Build Lagos-Setup-Lion> using
 slave <SlaveBuilder builder='Lagos-Setup-Lion' slave='BookshelfTester-
 Lion'>
 2011-07-18 12:26:28-0700 [-] acquireLocks(slave <BuildSlave
 'BookshelfTester-Lion', current builders: Lagos-Setup-Lion,Bookshelf-
 Lion>, locks [])
 2011-07-18 12:26:28-0700 [-] starting build <Build Lagos-Setup-Lion>..
 pinging the slave <SlaveBuilder builder='Lagos-Setup-Lion' slave
 ='BookshelfTester-Lion'>
 2011-07-18 12:26:28-0700 [-] sending ping
 2011-07-18 12:26:28-0700 [Broker,0,17.226.12.156] ping finished: success
 2011-07-18 12:26:28-0700 [-] <Build Lagos-Setup-Lion>.startBuild
 2011-07-18 12:26:28-0700 [-] ShellCommand.startCommand(cmd=<RemoteCommand
 'git' at 4338127432>)
 2011-07-18 12:26:28-0700 [-]   cmd.args = {'ignore_ignores': None,
 'retry': None, 'branch': 'master', 'reference': None, 'submodules': False,
 'shallow': False, 'patch': None, 'repourl': 'ssh://devpubs-
 bot at git.apple.com/git/DevPubs/Lagos/lagos-setup', 'workdir': 'build',
 'mode': 'clobber', 'timeout': 1200, 'progress': False, 'revision': None}
 2011-07-18 12:26:28-0700 [-] Warning: Overwriting old serialized Build at
 /Users/buildmaster/BuildMasters/Bookshelf/Lagos-Setup-Lion/0-log-git-stdio
 2011-07-18 12:26:28-0700 [-] <RemoteCommand 'git' at 4338127432>:
 RemoteCommand.run [0]
 2011-07-18 12:26:28-0700 [-] LoggedRemoteCommand.start
 2011-07-18 12:26:37-0700 [Broker,0,17.226.12.156] <RemoteCommand 'git' at
 4338127432> rc=0
 2011-07-18 12:26:37-0700 [-] closing log <buildbot.status.logfile.LogFile
 instance at 0x10291c710>
 }}}
 -----

 At this point, I notice:

 - the buildmaster's CPU usage going to about 70% and staying there (I
 killed the process after 10 minutes--the build should take considerably
 less time than that).
 - that I can no longer load the waterfall page for that buildmaster
 (although other buildmasters running on that machine are still
 responsive).
 - the process that's running out of control is Python itself. I've
 attached the sample (PythonSample.txt ) to this report.

 Killing the slave doesn't cause the master to recover. If I 'make stop'
 the buildmaster, the twistd.pid file is not removed, and the CPU usage is
 still pegged. I have to 'kill -9 <pid>' to reset things.

-- 
Ticket URL: <http://trac.buildbot.net/ticket/2055>
Buildbot <http://buildbot.net/>
Buildbot: build/test automation


More information about the Commits mailing list