[Buildbot] #3670: cygwin slave hangs burning CPU after running git, doesn't notice git has finished
Buildbot trac
trac at buildbot.net
Thu Mar 9 22:35:31 UTC 2017
#3670: cygwin slave hangs burning CPU after running git, doesn't notice git has
finished
-------------------+-----------------------
Reporter: dank | Owner:
Type: defect | Status: new
Priority: major | Milestone: undecided
Version: 0.8.12 | Keywords:
-------------------+-----------------------
This may be the problem I mentioned way back in
https://lists.buildbot.net/pipermail/devel/2012-December/009258.html
which appeared to happen both on linux and windows.
It involves cygwin, which immediately makes it suspect, but I'm bringing
it up
here anyway since that's what the thread ended with back then.
It happened both on win7 with long-ago 32 bit cygwin, and win10 with
current
64 bit cygwin.
I've been half-heartedly using a buildbot slave on cygwin for the last
four years (!). It occasionally hangs during the git step, but I've
always ignored it (and just rebooted). I'm doing a little more with
windows now, so am motivated to look into the problem.
This is vanilla buildbot-slave-0.8.12 on up to date cygwin64 with
Twisted-13.2.0 from tarball, cygwin python 2.7.12, and cygwin git.
In this case, the worker is started from the desktop. This was a try
build, and hung in the git update patch step.
cygwin ps during the hang shows
{{{
PID PPID PGID WINPID TTY UID STIME COMMAND
10004 1684 10004 13456 pty0 197609 Mar 6
/usr/bin/bash
756 10004 756 4496 pty0 197609 Mar 6
/usr/bin/sh
9276 756 756 8232 pty0 197609 Mar 6
/home/buildbot/slave-state/cygwin/bin/python
8144 9276 8144 15424 pty0 197609 13:14:19
/usr/bin/git <defunct>
}}}
task manager shows python is burning 95% of CPU.
The worker log ends with
{{{
2017-03-09 13:14:19-0800 [Broker,client] RunProcess._startCommand
2017-03-09 13:14:19-0800 [Broker,client] git apply --index -p 1
2017-03-09 13:14:19-0800 [Broker,client] in dir /home/buildbot/slave-
state/cygwin/g-speak-win10-bb-01/yovo-win10-iz-trybuilder/build (timeout
1200 secs)
2017-03-09 13:14:19-0800 [Broker,client] watching logfiles {}
2017-03-09 13:14:19-0800 [Broker,client] argv: ['git', 'apply', '--
index', '-p', '1']
2017-03-09 13:14:19-0800 [Broker,client] environment: {'TMP': '/tmp',
'PSModulePath':
'C:\\Windows\\system32\\WindowsPowerShell\\v1.0\\Modules\\', 'HOME':
'/home/buildbot', 'VS140COMNTOOLS': 'C:\\Program Files (x86)\\Microsoft
Visual Studio 14.0\\Common7\\Tools\\', 'VS110COMNTOOLS': 'C:\\Program
Files (x86)\\Microsoft Visual Studio 11.0\\Common7\\Tools\\',
'USERDOMAIN': 'WIN10-BB-01', 'OneDrive': 'C:\\Users\\buildbot\\OneDrive',
'COMPUTERNAME': 'WIN10-BB-01', 'ORIGINAL_PATH':
'/cygdrive/c/Windows/system32:/cygdrive/c/Windows:/cygdrive/c/Windows/System32/Wbem:/cygdrive/c/Windows/System32/WindowsPowerShell/v1.0:/cygdrive/c/Users/buildbot/.dnx/bin:/cygdrive/c/Program
Files/Microsoft DNX/Dnvm:/cygdrive/c/Program Files (x86)/Windows
Kits/8.1/Windows Performance Toolkit:/cygdrive/c/Program Files/Microsoft
SQL Server/130/Tools/Binn:/cygdrive/c/Program
Files/Git/cmd:/cygdrive/c/Program Files/CMake/bin:/cygdrive/c/python36',
'PROCESSOR_IDENTIFIER': 'Intel64 Family 6 Model 63 Stepping 2,
GenuineIntel', 'LOGNAME': 'buildbot', 'USER': 'buildbot',
'PROCESSOR_REVISION': '3f02', 'EXECIGNORE': '*.dll', 'PATH':
'/home/buildbot/slave-
state/cygwin/bin:/bin:/usr/local/bin:/usr/bin:/cygdrive/c/Windows/system32:/cygdrive/c/Windows:/cygdrive/c/Windows/System32/Wbem:/cygdrive/c/Windows/System32/WindowsPowerShell/v1.0:/cygdrive/c/Users/buildbot/.dnx/bin:/cygdrive/c/Program
Files/Microsoft DNX/Dnvm:/cygdrive/c/Program Files (x86)/Windows
Kits/8.1/Windows Performance Toolkit:/cygdrive/c/Program Files/Microsoft
SQL Server/130/Tools/Binn:/cygdrive/c/Program
Files/Git/cmd:/cygdrive/c/Program Files/CMake/bin:/cygdrive/c/python36',
'PS1': '(cygwin) ', 'LANG': 'en_US.UTF-8', 'INFOPATH':
'/usr/local/info:/usr/share/info:/usr/info', 'TERM': 'xterm', 'SHELL':
'/bin/bash', 'TZ': 'America/Los_Angeles', 'TEMP': '/tmp', 'ProgramData':
'C:\\ProgramData', 'SHLVL': '2', 'CommonProgramFiles(x86)': 'C:\\Program
Files (x86)\\Common Files', 'PROCESSOR_ARCHITECTURE': 'AMD64',
'ALLUSERSPROFILE': 'C:\\ProgramData', '!::': '::\\', 'LOCALAPPDATA':
'C:\\Users\\buildbot\\AppData\\Local', 'HOMEPATH': '\\Users\\buildbot',
'USERDOMAIN_ROAMINGPROFILE': 'WIN10-BB-01', 'VS120COMNTOOLS': 'C:\\Program
Files (x86)\\Microsoft Visual Studio 12.0\\Common7\\Tools\\',
'PROFILEREAD': 'true', 'ProgramFiles(x86)': 'C:\\Program Files (x86)',
'ProgramW6432': 'C:\\Program Files', 'USERNAME': 'buildbot',
'LOGONSERVER': '\\\\WIN10-BB-01', 'SYSTEMROOT': 'C:\\Windows', 'COMSPEC':
'C:\\Windows\\system32\\cmd.exe', 'WINDIR': 'C:\\Windows', 'VIRTUAL_ENV':
'/home/buildbot/slave-state/cygwin', 'PRINTER': 'Microsoft Print to PDF',
'SESSIONNAME': 'RDP-Tcp#11', 'PATHEXT':
'.COM;.EXE;.BAT;.CMD;.VBS;.VBE;.JS;.JSE;.WSF;.WSH;.MSC;.PY',
'CommonProgramW6432': 'C:\\Program Files\\Common Files', 'CLIENTNAME':
'dank-dell3620', 'COMMONPROGRAMFILES': 'C:\\Program Files\\Common Files',
'APPDATA': 'C:\\Users\\buildbot\\AppData\\Roaming', 'HOMEDRIVE': 'C:',
'OB_DUMPER_HOST': 'git.oblong.com', 'SYSTEMDRIVE': 'C:', 'HOSTNAME':
'win10-bb-01', 'PUBLIC': 'C:\\Users\\Public', 'NUMBER_OF_PROCESSORS': '1',
'PWD': '/home/buildbot/slave-state/cygwin/g-speak-win10-bb-01/yovo-win10
-iz-trybuilder/build', 'PROCESSOR_LEVEL': '6', 'CCACHE_DIR':
'/home/buildbot/slave-state/cygwin/g-speak-win10-bb-01/ccache.dir',
'USERPROFILE': 'C:\\Users\\buildbot', 'OS': 'Windows_NT', 'OLDPWD':
'/home/buildbot/src/ob-repobot', 'PROGRAMFILES': 'C:\\Program Files'}
2017-03-09 13:14:19-0800 [Broker,client] writing 122297 bytes to stdin
2017-03-09 13:14:19-0800 [Broker,client] using PTY: False
}}}
Attaching to the cygwin python process with cygwin strace shows an
infinite quick repeat of
{{{
--- Process 8232, exception c0000005 at 0000000180042C51
55 2120509 [flasio] python 9276 exception::handle: In
cygwin_except_handler exception 0xC0000005 at 0x180042C51 sp 0x275CB60
32 2120541 [flasio] python 9276 exception::handle: In
cygwin_except_handler signal 11 at 0x180042C51
72 2120613 [flasio] python 9276 _cygtls::inside_kernel: pc 0x180042C51,
h 0x180040000, inside_kernel 0
40 2120653 [flasio] python 9276 seterrno_from_nt_status:
/home/corinna/src/cygwin/cygwin-2.6.0/cygwin-2.6.0-1.x86_64/src/newlib-
cygwin/winsup/cygwin/fhandler_mailslot.cc:132 status 0xC0000034 -> windows
error 2
30 2120683 [flasio] python 9276 geterrno_from_win_error: windows error
2 == errno 2
30 2120713 [flasio] python 9276 sig_send: sendsig 0xC4, pid 9276,
signal 11, its_me 1
28 2120741 [flasio] python 9276 sig_send: wakeup 0x42C
32 2120773 [sig] python 9276 sigpacket::process: signal 11 processing
19 2120792 [sig] python 9276 sigpacket::process: signal 11 blocked
18 2120810 [sig] python 9276 sigpacket::process: returning -1
22 2120832 [sig] python 9276 wait_sig: signalling pack.wakeup 0x42C
24 2120856 [flasio] python 9276 sig_send: Waiting for pack.wakeup 0x42C
30 2120886 [flasio] python 9276 sig_send: returning 0x0 from sending
signal 11
}}}
If I wait long enough, the 1200 second timeout kicks in, the slave
terminates,
and the waterfall shows the job failed with an interrupt
{{{
[Failure instance: Traceback (failure with no frames): <class
'twisted.internet.error.ConnectionLost'>: Connection to the other side was
lost in a non-clean fashion.
]
}}}
Starting the slave again retries the build
as expected, and usually succeeds.
Cygwin used to have a problem with child reaping,
https://cygwin.com/ml/cygwin/2006-02/msg00849.html
but I haven't tried sending a CHLD signal yet to python to see if it
then notices that its git child has terminated.
Next time it happens I'll try to get more info.
--
Ticket URL: <http://trac.buildbot.net/ticket/3670>
Buildbot <http://buildbot.net/>
Buildbot: build/test automation
More information about the bugs
mailing list