[Buildbot-commits] [SPAM] [Buildbot] #802: Early Timeout

Buildbot buildbot-devel at lists.sourceforge.net
Thu Apr 22 08:58:58 UTC 2010


#802: Early Timeout
-------------------+--------------------------------------------------------
Reporter:  axel    |       Owner:           
    Type:  defect  |      Status:  new      
Priority:  minor   |   Milestone:  undecided
 Version:  0.7.12  |    Keywords:  timeout  
-------------------+--------------------------------------------------------
 Buildbot decides it's time to kill a step even though its time has not yet
 come. In this example, Buildbot kills configure after a successful SVN.
 Configure was only running for 14 seconds; the timeout is 1200 seconds.
 Slave's log excerpt (complete for this time window, i.e. I did not remove
 lines in between):


 {{{
 2010-04-22 10:00:45+0200 [-]   closing stdin
 2010-04-22 10:00:45+0200 [-]   using PTY: False
 2010-04-22 10:03:04+0200 [-] command finished with signal None, exit code
 0, elapsedTime: 138.333220
 2010-04-22 10:03:04+0200 [-] SlaveBuilder.commandComplete
 <buildbot.slave.commands.SVN instance at 0xb78c1c6c>
 2010-04-22 10:03:04+0200 [Broker,client]  startCommand:shell [id 137]
 2010-04-22 10:03:04+0200 [Broker,client] ShellCommand._startCommand
 2010-04-22 10:03:04+0200 [Broker,client]  bash -c ./configure  --enable-
 gdml  --enable-minuit2  --enable-roofit  --enable-unuran
 2010-04-22 10:03:04+0200 [Broker,client]   in dir
 /home/buildbot/BuildSlaves/ROOT/ROOT-GCC41-SLC5-32-hourly/build (timeout
 1200 secs)
 2010-04-22 10:03:04+0200 [Broker,client]   watching logfiles {}
 2010-04-22 10:03:04+0200 [Broker,client]   argv: bash -c ./configure
 --enable-gdml  --enable-minuit2  --enable-roofit  --enable-unuran
 2010-04-22 10:03:04+0200 [Broker,client]  environment: {'SSH_ASKPASS':
 '/usr/libexec/openssh/gnome-ssh-askpass', 'GROUP': 'buildbot', 'HISTFILE':
 '/home/buildbot/.history.vm-slc5-i686', 'HPX_OS_MINOR': '6', 'BINTYPE':
 'Linux', 'HSM_HOME': '/hpss/XYZ.AB/user/b/buildbot', 'LESSOPEN':
 '|/usr/bin/lesspipe.sh %s', 'ENVIRONMENT': 'LOGIN', 'SSH_CLIENT':
 '192.168.122.1 35234 22', 'CVS_RSH': 'ssh', 'LOGNAME': 'buildbot', 'USER':
 'buildbot', 'INPUTRC': '/etc/inputrc', 'YP': 'NO', 'USERPATH':
 '/home/buildbot/bin:/home/buildbot/scripts:/usr/sue/bin:/usr/local/bin:/usr/local/bin/X11:/usr/bin:/bin:/usr/bin/X11:/XYZ/pro/bin',
 'PS1': '[\\h] ${PWD} > ', 'LANG': 'en_US.UTF-8', 'KDEDIR': '/usr', 'TERM':
 'xterm', 'SHELL': '/bin/bash', 'INITIALISED': 'YES', 'SHLVL': '1',
 'XYZ_LEVEL': 'pro', 'DOMAINNAME': '(none)', 'G_BROKEN_FILENAMES': '1',
 'RUBOUT': 'BackSpace', 'HPX_INITIALE': 'b', 'KDE_NO_IPV6': '1', 'EDITOR':
 'pico -w', 'MANPATH':
 '/home/buildbot/man:/usr/sue/man::/usr/local/man:/XYZ/man', 'HPX_CELL':
 'XYZ.AB', 'HOME': '/home/buildbot', 'HPX_HEPGROUP_DIR': '', 'HPSS_HOME':
 '/hpss/XYZ.AB/user/b/buildbot', 'HISTSIZE': '1000', 'XYZ_ROOT':
 '/XYZ/pro', 'NNTPSERVER': 'news.XYZ.AB', 'ROOTPATH':
 '/usr/bin:/bin:/usr/bin/X11', 'PATH':
 '/home/buildbot/install/gccxml/bin:/home/buildbot/bin:/home/buildbot/scripts:/usr/sue/bin:/usr/local/bin:/usr/local/bin/X11:/usr/bin:/bin:/usr/bin/X11:/XYZ/pro/bin:/usr/kerberos/bin:/home/buildbot/bin',
 'OS': 'Linux', 'HOST': 'vm-slc5-i686', '_HPX_SEEN_SITE_ENV': '1', 'ENV':
 '/etc/hepix/sh/HEP/env', 'CLUSTER_DIR': '/etc/hepix/cluster', 'PRINT_CMD':
 'xprint', 'HPX_OS_MAJOR': '2', '_': '/usr/bin/buildbot', 'LS_COLORS':
 'no=00:fi=00:di=00;34:ln=00;36:pi=40;33:so=00;35:bd=40;33;01:cd=40;33;01:or=01;05;37;41:mi=01;05;37;41:ex=00;32:*.cmd=00;32:*.exe=00;32:*.com=00;32:*.btm=00;32:*.bat=00;32:*.sh=00;32:*.csh=00;32:*.tar=00;31:*.tgz=00;31:*.arj=00;31:*.taz=00;31:*.lzh=00;31:*.zip=00;31:*.z=00;31:*.Z=00;31:*.gz=00;31:*.bz2=00;31:*.bz=00;31:*.tz=00;31:*.rpm=00;31:*.cpio=00;31:*.jpg=00;35:*.gif=00;35:*.bmp=00;35:*.xbm=00;35:*.xpm=00;35:*.png=00;35:*.tif=00;35:',
 'KDE_IS_PRELINKED': '1', 'TMPDIR': '/tmp/buildbot', 'MAIL':
 '/var/spool/mail/buildbot', 'XYZ': '/XYZ', 'SYSTYPE': 'sysV', 'OLDPWD':
 '/home/buildbot', 'HOSTNAME': 'vm-slc5-i686', 'VISUAL': 'pico -w',
 'SAVEHIST': '500', 'GROUP_DIR': '', 'PWD':
 '/home/buildbot/BuildSlaves/ROOT/ROOT-GCC41-SLC5-32-hourly/build',
 'CASTOR_HOME': '/castor/XYZ.AB/user/b/buildbot', 'SSH_TTY': '/dev/pts/2',
 'XFILESEARCHPATH': '/usr/lib/X11/app-defaults/%N:/usr/local/lib/X11/app-
 defaults/%N', 'WWW_HOME': 'http://www.XYZ.AB', 'SSH_CONNECTION':
 '192.168.122.1 35234 192.168.122.238 22', 'PAGER': 'less'}
 2010-04-22 10:03:04+0200 [Broker,client]   closing stdin
 2010-04-22 10:03:04+0200 [Broker,client]   using PTY: False
 2010-04-22 10:03:18+0200 [-] sending app-level keepalive
 2010-04-22 10:03:18+0200 [-] command timed out: 1200 seconds without
 output, killing pid 11978
 2010-04-22 10:03:18+0200 [-] trying os.kill(-pid, 9)
 2010-04-22 10:03:18+0200 [-] trying process.signalProcess('KILL')
 2010-04-22 10:03:18+0200 [-]  signal KILL sent successfully
 2010-04-22 10:03:18+0200 [-] command timed out: 1200 seconds without
 output, killing pid 27249
 2010-04-22 10:03:18+0200 [-] trying os.kill(-pid, 9)
 2010-04-22 10:03:18+0200 [-] trying process.signalProcess('KILL')
 2010-04-22 10:03:18+0200 [-]  signal KILL sent successfully
 2010-04-22 10:03:18+0200 [-] command finished with signal 9, exit code
 None, elapsedTime: 2109.789688
 2010-04-22 10:03:18+0200 [-] SlaveBuilder.commandComplete
 <buildbot.slave.commands.SlaveShellCommand instance at 0xb78bae6c>
 2010-04-22 10:03:18+0200 [-] command finished with signal 9, exit code
 None, elapsedTime: 14.473772
 2010-04-22 10:03:18+0200 [-] SlaveBuilder.commandComplete
 <buildbot.slave.commands.SlaveShellCommand instance at 0xb78c17ac>
 2010-04-22 10:12:49+0200 [-] sending app-level keepalive
 }}}


 14.5 is much less than 1200. This was the only build running on this build
 slave instance at that time. Corresponding log excerpt from the master:


 {{{
 2010-04-22 10:03:52+0200 [Broker,0,128.A.B.C] <RemoteShellCommand ''bash
 -c ./configure  --enable-gdml  --enable-minuit2  --enable-roofit
 --enable-unuran''> rc=-1
 2010-04-22 10:03:52+0200 [-] releaseLocks(<Build roottest-
 GCC41-SLC5-32-incr>): [(<SlaveLock(GCC41-SLC5-32-bot-
 lxb095-slc5-32-gcc41-incr, 1)[bot-lxb095-slc5-32-gcc41] 386
 06608>, <buildbot.locks.LockAccess instance at 0x1e66c68>)]
 2010-04-22 10:03:52+0200 [-] maybeStartBuild <Builder 'roottest-
 GCC41-SLC5-32-incr' at 31937168>: 0 request(s), 1 slave(s)
 (and many more)
 2010-04-22 10:03:52+0200 [-] closing log <buildbot.status.builder.LogFile
 instance at 0x2185b00>
 2010-04-22 10:03:52+0200 [-] releaseLocks(<buildbot.steps.shell.Configure
 instance at 0x296ff38>): []
 2010-04-22 10:03:52+0200 [-]  step 'configure' complete: failure
 2010-04-22 10:03:52+0200 [-]  <Build ROOT-GCC41-SLC5-32-hourly>: build
 finished
 2010-04-22 10:03:52+0200 [-] releaseLocks(<Build ROOT-
 GCC41-SLC5-32-hourly>): [(<SlaveLock(GCC41-SLC5-32-bot-
 lxb095-slc5-32-gcc41-hourly, 1)[bot-lxb095-slc5-32-gcc41] 46128592>,
 <buildbot.locks.LockAccess instance at 0x1e66d88>)]
 2010-04-22 10:03:52+0200 [-] maybeStartBuild <Builder 'roottest-
 GCC41-SLC5-32-incr' at 31937168>: 0 request(s), 1 slave(s)
 (and many more)
 }}}


 The builder is running in a KVM virtual machine; the VM's host has two VMs
 running. They are both used for buildbot, they use NAT, each has 2 virtual
 cores (the host has 4).

 These fake timeouts hit us intermittently, mostly on OpenSolaris but here
 on a mutation of RHEL 5.

 The uneducated questions I have are: where does the elapsedTime of
 2109.789688 come from? And why are there two processes running for one
 configure invocation? Help with debugging this or finding my configuration
 bug are very welcome...

-- 
Ticket URL: <http://buildbot.net/trac/ticket/802>
Buildbot <http://buildbot.net/>
Buildbot: build/test automation


More information about the Commits mailing list