[Buildbot-commits] [SPAM] [Buildbot] #802: Early Timeout
Buildbot
buildbot-devel at lists.sourceforge.net
Thu Apr 22 08:58:58 UTC 2010
#802: Early Timeout
-------------------+--------------------------------------------------------
Reporter: axel | Owner:
Type: defect | Status: new
Priority: minor | Milestone: undecided
Version: 0.7.12 | Keywords: timeout
-------------------+--------------------------------------------------------
Buildbot decides it's time to kill a step even though its time has not yet
come. In this example, Buildbot kills configure after a successful SVN.
Configure was only running for 14 seconds; the timeout is 1200 seconds.
Slave's log excerpt (complete for this time window, i.e. I did not remove
lines in between):
{{{
2010-04-22 10:00:45+0200 [-] closing stdin
2010-04-22 10:00:45+0200 [-] using PTY: False
2010-04-22 10:03:04+0200 [-] command finished with signal None, exit code
0, elapsedTime: 138.333220
2010-04-22 10:03:04+0200 [-] SlaveBuilder.commandComplete
<buildbot.slave.commands.SVN instance at 0xb78c1c6c>
2010-04-22 10:03:04+0200 [Broker,client] startCommand:shell [id 137]
2010-04-22 10:03:04+0200 [Broker,client] ShellCommand._startCommand
2010-04-22 10:03:04+0200 [Broker,client] bash -c ./configure --enable-
gdml --enable-minuit2 --enable-roofit --enable-unuran
2010-04-22 10:03:04+0200 [Broker,client] in dir
/home/buildbot/BuildSlaves/ROOT/ROOT-GCC41-SLC5-32-hourly/build (timeout
1200 secs)
2010-04-22 10:03:04+0200 [Broker,client] watching logfiles {}
2010-04-22 10:03:04+0200 [Broker,client] argv: bash -c ./configure
--enable-gdml --enable-minuit2 --enable-roofit --enable-unuran
2010-04-22 10:03:04+0200 [Broker,client] environment: {'SSH_ASKPASS':
'/usr/libexec/openssh/gnome-ssh-askpass', 'GROUP': 'buildbot', 'HISTFILE':
'/home/buildbot/.history.vm-slc5-i686', 'HPX_OS_MINOR': '6', 'BINTYPE':
'Linux', 'HSM_HOME': '/hpss/XYZ.AB/user/b/buildbot', 'LESSOPEN':
'|/usr/bin/lesspipe.sh %s', 'ENVIRONMENT': 'LOGIN', 'SSH_CLIENT':
'192.168.122.1 35234 22', 'CVS_RSH': 'ssh', 'LOGNAME': 'buildbot', 'USER':
'buildbot', 'INPUTRC': '/etc/inputrc', 'YP': 'NO', 'USERPATH':
'/home/buildbot/bin:/home/buildbot/scripts:/usr/sue/bin:/usr/local/bin:/usr/local/bin/X11:/usr/bin:/bin:/usr/bin/X11:/XYZ/pro/bin',
'PS1': '[\\h] ${PWD} > ', 'LANG': 'en_US.UTF-8', 'KDEDIR': '/usr', 'TERM':
'xterm', 'SHELL': '/bin/bash', 'INITIALISED': 'YES', 'SHLVL': '1',
'XYZ_LEVEL': 'pro', 'DOMAINNAME': '(none)', 'G_BROKEN_FILENAMES': '1',
'RUBOUT': 'BackSpace', 'HPX_INITIALE': 'b', 'KDE_NO_IPV6': '1', 'EDITOR':
'pico -w', 'MANPATH':
'/home/buildbot/man:/usr/sue/man::/usr/local/man:/XYZ/man', 'HPX_CELL':
'XYZ.AB', 'HOME': '/home/buildbot', 'HPX_HEPGROUP_DIR': '', 'HPSS_HOME':
'/hpss/XYZ.AB/user/b/buildbot', 'HISTSIZE': '1000', 'XYZ_ROOT':
'/XYZ/pro', 'NNTPSERVER': 'news.XYZ.AB', 'ROOTPATH':
'/usr/bin:/bin:/usr/bin/X11', 'PATH':
'/home/buildbot/install/gccxml/bin:/home/buildbot/bin:/home/buildbot/scripts:/usr/sue/bin:/usr/local/bin:/usr/local/bin/X11:/usr/bin:/bin:/usr/bin/X11:/XYZ/pro/bin:/usr/kerberos/bin:/home/buildbot/bin',
'OS': 'Linux', 'HOST': 'vm-slc5-i686', '_HPX_SEEN_SITE_ENV': '1', 'ENV':
'/etc/hepix/sh/HEP/env', 'CLUSTER_DIR': '/etc/hepix/cluster', 'PRINT_CMD':
'xprint', 'HPX_OS_MAJOR': '2', '_': '/usr/bin/buildbot', 'LS_COLORS':
'no=00:fi=00:di=00;34:ln=00;36:pi=40;33:so=00;35:bd=40;33;01:cd=40;33;01:or=01;05;37;41:mi=01;05;37;41:ex=00;32:*.cmd=00;32:*.exe=00;32:*.com=00;32:*.btm=00;32:*.bat=00;32:*.sh=00;32:*.csh=00;32:*.tar=00;31:*.tgz=00;31:*.arj=00;31:*.taz=00;31:*.lzh=00;31:*.zip=00;31:*.z=00;31:*.Z=00;31:*.gz=00;31:*.bz2=00;31:*.bz=00;31:*.tz=00;31:*.rpm=00;31:*.cpio=00;31:*.jpg=00;35:*.gif=00;35:*.bmp=00;35:*.xbm=00;35:*.xpm=00;35:*.png=00;35:*.tif=00;35:',
'KDE_IS_PRELINKED': '1', 'TMPDIR': '/tmp/buildbot', 'MAIL':
'/var/spool/mail/buildbot', 'XYZ': '/XYZ', 'SYSTYPE': 'sysV', 'OLDPWD':
'/home/buildbot', 'HOSTNAME': 'vm-slc5-i686', 'VISUAL': 'pico -w',
'SAVEHIST': '500', 'GROUP_DIR': '', 'PWD':
'/home/buildbot/BuildSlaves/ROOT/ROOT-GCC41-SLC5-32-hourly/build',
'CASTOR_HOME': '/castor/XYZ.AB/user/b/buildbot', 'SSH_TTY': '/dev/pts/2',
'XFILESEARCHPATH': '/usr/lib/X11/app-defaults/%N:/usr/local/lib/X11/app-
defaults/%N', 'WWW_HOME': 'http://www.XYZ.AB', 'SSH_CONNECTION':
'192.168.122.1 35234 192.168.122.238 22', 'PAGER': 'less'}
2010-04-22 10:03:04+0200 [Broker,client] closing stdin
2010-04-22 10:03:04+0200 [Broker,client] using PTY: False
2010-04-22 10:03:18+0200 [-] sending app-level keepalive
2010-04-22 10:03:18+0200 [-] command timed out: 1200 seconds without
output, killing pid 11978
2010-04-22 10:03:18+0200 [-] trying os.kill(-pid, 9)
2010-04-22 10:03:18+0200 [-] trying process.signalProcess('KILL')
2010-04-22 10:03:18+0200 [-] signal KILL sent successfully
2010-04-22 10:03:18+0200 [-] command timed out: 1200 seconds without
output, killing pid 27249
2010-04-22 10:03:18+0200 [-] trying os.kill(-pid, 9)
2010-04-22 10:03:18+0200 [-] trying process.signalProcess('KILL')
2010-04-22 10:03:18+0200 [-] signal KILL sent successfully
2010-04-22 10:03:18+0200 [-] command finished with signal 9, exit code
None, elapsedTime: 2109.789688
2010-04-22 10:03:18+0200 [-] SlaveBuilder.commandComplete
<buildbot.slave.commands.SlaveShellCommand instance at 0xb78bae6c>
2010-04-22 10:03:18+0200 [-] command finished with signal 9, exit code
None, elapsedTime: 14.473772
2010-04-22 10:03:18+0200 [-] SlaveBuilder.commandComplete
<buildbot.slave.commands.SlaveShellCommand instance at 0xb78c17ac>
2010-04-22 10:12:49+0200 [-] sending app-level keepalive
}}}
14.5 is much less than 1200. This was the only build running on this build
slave instance at that time. Corresponding log excerpt from the master:
{{{
2010-04-22 10:03:52+0200 [Broker,0,128.A.B.C] <RemoteShellCommand ''bash
-c ./configure --enable-gdml --enable-minuit2 --enable-roofit
--enable-unuran''> rc=-1
2010-04-22 10:03:52+0200 [-] releaseLocks(<Build roottest-
GCC41-SLC5-32-incr>): [(<SlaveLock(GCC41-SLC5-32-bot-
lxb095-slc5-32-gcc41-incr, 1)[bot-lxb095-slc5-32-gcc41] 386
06608>, <buildbot.locks.LockAccess instance at 0x1e66c68>)]
2010-04-22 10:03:52+0200 [-] maybeStartBuild <Builder 'roottest-
GCC41-SLC5-32-incr' at 31937168>: 0 request(s), 1 slave(s)
(and many more)
2010-04-22 10:03:52+0200 [-] closing log <buildbot.status.builder.LogFile
instance at 0x2185b00>
2010-04-22 10:03:52+0200 [-] releaseLocks(<buildbot.steps.shell.Configure
instance at 0x296ff38>): []
2010-04-22 10:03:52+0200 [-] step 'configure' complete: failure
2010-04-22 10:03:52+0200 [-] <Build ROOT-GCC41-SLC5-32-hourly>: build
finished
2010-04-22 10:03:52+0200 [-] releaseLocks(<Build ROOT-
GCC41-SLC5-32-hourly>): [(<SlaveLock(GCC41-SLC5-32-bot-
lxb095-slc5-32-gcc41-hourly, 1)[bot-lxb095-slc5-32-gcc41] 46128592>,
<buildbot.locks.LockAccess instance at 0x1e66d88>)]
2010-04-22 10:03:52+0200 [-] maybeStartBuild <Builder 'roottest-
GCC41-SLC5-32-incr' at 31937168>: 0 request(s), 1 slave(s)
(and many more)
}}}
The builder is running in a KVM virtual machine; the VM's host has two VMs
running. They are both used for buildbot, they use NAT, each has 2 virtual
cores (the host has 4).
These fake timeouts hit us intermittently, mostly on OpenSolaris but here
on a mutation of RHEL 5.
The uneducated questions I have are: where does the elapsedTime of
2109.789688 come from? And why are there two processes running for one
configure invocation? Help with debugging this or finding my configuration
bug are very welcome...
--
Ticket URL: <http://buildbot.net/trac/ticket/802>
Buildbot <http://buildbot.net/>
Buildbot: build/test automation
More information about the Commits
mailing list