[Buildbot-devel] Buildbot master CPU usage

Thomas Middeldorp thomas at ranzer.geek.nz
Thu Jul 14 23:48:43 UTC 2011


On 7/9/2011 2:43 AM, Dustin J. Mitchell wrote:
> On Thu, Jul 7, 2011 at 6:38 PM, Thomas Middeldorp<thomas at ranzer.geek.nz>  wrote:
>> Could anyone please offer some advice on how to figure out what is
>> causing the high CPU usage or how to bring it to a more reasonable level?
> The best place to start is to look at the twistd.log and see what the
> master's doing too much of.  You can also use tools like strace or
> dtruss to see what it's doing.
>
> Dustin

Tried investigating a bit more, it seems the cpu usage spikes up even 
when there is nothing happening in twistd.log.

strace is pretty spammy and I'm not really sure what should stand out/be 
suspicious... but it does seem to frequently try to stat a whole lot of 
files. It tries to stat /etc/localtime a few hundred times in a row and 
then tries pages and pages worth of log files (about 2/3 of which are 
not found), python files, and more localtime. e.g.:

stat("/mnt/raid0/buildmaster/linux_asset_tests_incr/955-log-shell-stdio.bz2", 
0x7fffde6550b0) = -1 ENOENT (No such file or directory)
stat("/mnt/raid0/buildmaster/linux_asset_tests_incr/955-log-shell-stdio.gz", 
0x7fffde6550b0) = -1 ENOENT (No such file or directory)
stat("/mnt/raid0/buildmaster/linux_asset_tests_incr/955-log-shell-stdio", {st_mode=S_IFREG|0600, 
st_size=1959, ...}) = 0
stat("/usr/local/lib/python2.6/dist-packages/buildbot/status/web/templates/box_macros.html", 
{st_mode=S_IFREG|0644, st_size=1071, ...}) = 0
stat("/usr/local/lib/python2.6/dist-packages/buildbot/status/web/templates/box_macros.html", 
{st_mode=S_IFREG|0644, st_size=1071, ...}) = 0
stat("/usr/local/lib/python2.6/dist-packages/buildbot/status/web/templates/box_macros.html", 
{st_mode=S_IFREG|0644, st_size=1071, ...}) = 0
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2434, ...}) = 0
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2434, ...}) = 0
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2434, ...}) = 0

running strace with -c provided the following output, also pointing the 
finger at stat:

% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
  76.47    0.000039           0     64391     11380 stat
  23.53    0.000012           0       450       450 access
   0.00    0.000000           0       226           read
   0.00    0.000000           0        11           write
   0.00    0.000000           0         2           open
   0.00    0.000000           0         2           close
   0.00    0.000000           0       678           fstat
   0.00    0.000000           0       226           lseek
   0.00    0.000000           0         1           brk
   0.00    0.000000           0        33           select
   0.00    0.000000           0        31           sendto
   0.00    0.000000           0        11           recvfrom
   0.00    0.000000           0         7           getsockname
   0.00    0.000000           0       900           fcntl
------ ----------- ----------- --------- --------- ----------------
100.00    0.000051                 66969     11830 total

I also tried searching for 'access' since it seems to error entirely and 
is the only other thing that takes any time, and it all seems to be 
these two files:

access("/mnt/raid0/buildmaster/state.sqlite-journal", F_OK) = -1 ENOENT 
(No such file or directory)
access("/mnt/raid0/buildmaster/state.sqlite-wal", F_OK) = -1 ENOENT (No 
such file or directory)

Is any of this useful, or are there any other strace parameters I should 
try?

Thanks for the help,
Thomas





More information about the devel mailing list