[Buildbot-devel] Buildbot cannot use all build slaves

Chen Yingliu chenyingliu at gmail.com
Thu Jan 3 01:31:10 UTC 2013

We have a buildbot setup with about 50 build slaves. They are used to
running small builds each take several minutes.

One problem we met is buildbot cannot use all build slaves. While I can see
a lot of pending build in builder page, I still can see a lot (about 1/3 -
1/2) build slaves are idle in buildslaves page.

I checked some metrics output:
Timer BotMaster.reconfigService: 2.67
Timer BotMaster.reconfigServiceBuilders: 0.0144
Timer BotMaster.reconfigServiceSlaves: 0.00257
Timer BuildMaster.pollDatabaseBuildRequests(): 0
Timer BuildMaster.pollDatabaseChanges(): 0
Timer BuildRequestDistributor._activityLoop(): 168
Timer BuildRequestDistributor._sortBuilders(): 0.000427
Timer ChangeManager.reconfigService: 0.00581
Timer RemoteCommand.overhead: 0.106
Timer SchedulerManager.reconfigService: 27.3
Timer reactorDelay: 0.0271
Counter AbstractBuildSlave.attached(): 85
Counter AbstractBuildSlave.attached_slaves: 74
Counter BotMaster.attached_slaves: -11
Counter BotMaster.getBuildersForSlave(): 9548
Counter BotMaster.slaveLost(): 11
Counter RemoteCommand.remoteUpdate(): 602912
Counter active_builds: 37
Counter added_changes: 964
Counter attached_slaves: 85
Counter gc.garbage: 0
Counter num_builders: 93
Counter num_schedulers: 124
Counter num_slaves: 98
Counter num_sources: 2
Counter resource.pagesize: 4096
Counter resource.ru_idrss: 0
Counter resource.ru_inblock: 122160
Counter resource.ru_isrss: 0
Counter resource.ru_ixrss: 0
Counter resource.ru_majflt: 10
Counter resource.ru_maxrss: 1081652
Counter resource.ru_minflt: 180562551
Counter resource.ru_msgrcv: 0
Counter resource.ru_msgsnd: 0
Counter resource.ru_nivcsw: 2844651
Counter resource.ru_nsignals: 0
Counter resource.ru_nswap: 0
Counter resource.ru_nvcsw: 26965343
Counter resource.ru_oublock: 3033048
Counter resource.ru_stime: 1696
Counter resource.ru_utime: 9497

One weird thing is BuildRequestDistributor._activityLoop() take more than
100 seconds, I suppose this should run fast ? However I don't quite get the
buildbot scheduling logic even after read the code for a while. Any one can
help figure out the reason why _activityLoop() took more than 100 seconds
and why not all build slave are used?

Thanks guys!

Chen Yingliu
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://buildbot.net/pipermail/devel/attachments/20130102/f8cf2e00/attachment.html>

More information about the devel mailing list