[Buildbot] #3670: cygwin slave hangs burning CPU after running git, doesn't notice git has finished

Buildbot trac trac at buildbot.net
Thu Mar 9 22:35:31 UTC 2017


#3670: cygwin slave hangs burning CPU after running git, doesn't notice git has
finished
-------------------+-----------------------
Reporter:  dank    |      Owner:
    Type:  defect  |     Status:  new
Priority:  major   |  Milestone:  undecided
 Version:  0.8.12  |   Keywords:
-------------------+-----------------------
 This may be the problem I mentioned way back in
 https://lists.buildbot.net/pipermail/devel/2012-December/009258.html
 which appeared to happen both on linux and windows.
 It involves cygwin, which immediately makes it suspect, but I'm bringing
 it up
 here anyway since that's what the thread ended with back then.
 It happened both on win7 with long-ago 32 bit cygwin, and win10 with
 current
 64 bit cygwin.

 I've been half-heartedly using a buildbot slave on cygwin for the last
 four years (!).  It occasionally hangs during the git step, but I've
 always ignored it (and just rebooted).  I'm doing a little more with
 windows now, so am motivated to look into the problem.

 This is vanilla buildbot-slave-0.8.12 on up to date cygwin64 with
 Twisted-13.2.0 from tarball, cygwin python 2.7.12, and cygwin git.
 In this case, the worker is started from the desktop.  This was a try
 build, and hung in the git update patch step.

 cygwin ps during the hang shows
 {{{
       PID    PPID    PGID     WINPID   TTY         UID    STIME COMMAND
     10004    1684   10004      13456  pty0      197609   Mar  6
 /usr/bin/bash
       756   10004     756       4496  pty0      197609   Mar  6
 /usr/bin/sh
      9276     756     756       8232  pty0      197609   Mar  6
 /home/buildbot/slave-state/cygwin/bin/python
      8144    9276    8144      15424  pty0      197609 13:14:19
 /usr/bin/git <defunct>
 }}}

 task manager shows python is burning 95% of CPU.

 The worker log ends with
 {{{
 2017-03-09 13:14:19-0800 [Broker,client] RunProcess._startCommand
 2017-03-09 13:14:19-0800 [Broker,client]  git apply --index -p 1
 2017-03-09 13:14:19-0800 [Broker,client]   in dir /home/buildbot/slave-
 state/cygwin/g-speak-win10-bb-01/yovo-win10-iz-trybuilder/build (timeout
 1200 secs)
 2017-03-09 13:14:19-0800 [Broker,client]   watching logfiles {}
 2017-03-09 13:14:19-0800 [Broker,client]   argv: ['git', 'apply', '--
 index', '-p', '1']
 2017-03-09 13:14:19-0800 [Broker,client]  environment: {'TMP': '/tmp',
 'PSModulePath':
 'C:\\Windows\\system32\\WindowsPowerShell\\v1.0\\Modules\\', 'HOME':
 '/home/buildbot', 'VS140COMNTOOLS': 'C:\\Program Files (x86)\\Microsoft
 Visual Studio 14.0\\Common7\\Tools\\', 'VS110COMNTOOLS': 'C:\\Program
 Files (x86)\\Microsoft Visual Studio 11.0\\Common7\\Tools\\',
 'USERDOMAIN': 'WIN10-BB-01', 'OneDrive': 'C:\\Users\\buildbot\\OneDrive',
 'COMPUTERNAME': 'WIN10-BB-01', 'ORIGINAL_PATH':
 '/cygdrive/c/Windows/system32:/cygdrive/c/Windows:/cygdrive/c/Windows/System32/Wbem:/cygdrive/c/Windows/System32/WindowsPowerShell/v1.0:/cygdrive/c/Users/buildbot/.dnx/bin:/cygdrive/c/Program
 Files/Microsoft DNX/Dnvm:/cygdrive/c/Program Files (x86)/Windows
 Kits/8.1/Windows Performance Toolkit:/cygdrive/c/Program Files/Microsoft
 SQL Server/130/Tools/Binn:/cygdrive/c/Program
 Files/Git/cmd:/cygdrive/c/Program Files/CMake/bin:/cygdrive/c/python36',
 'PROCESSOR_IDENTIFIER': 'Intel64 Family 6 Model 63 Stepping 2,
 GenuineIntel', 'LOGNAME': 'buildbot', 'USER': 'buildbot',
 'PROCESSOR_REVISION': '3f02', 'EXECIGNORE': '*.dll', 'PATH':
 '/home/buildbot/slave-
 state/cygwin/bin:/bin:/usr/local/bin:/usr/bin:/cygdrive/c/Windows/system32:/cygdrive/c/Windows:/cygdrive/c/Windows/System32/Wbem:/cygdrive/c/Windows/System32/WindowsPowerShell/v1.0:/cygdrive/c/Users/buildbot/.dnx/bin:/cygdrive/c/Program
 Files/Microsoft DNX/Dnvm:/cygdrive/c/Program Files (x86)/Windows
 Kits/8.1/Windows Performance Toolkit:/cygdrive/c/Program Files/Microsoft
 SQL Server/130/Tools/Binn:/cygdrive/c/Program
 Files/Git/cmd:/cygdrive/c/Program Files/CMake/bin:/cygdrive/c/python36',
 'PS1': '(cygwin) ', 'LANG': 'en_US.UTF-8', 'INFOPATH':
 '/usr/local/info:/usr/share/info:/usr/info', 'TERM': 'xterm', 'SHELL':
 '/bin/bash', 'TZ': 'America/Los_Angeles', 'TEMP': '/tmp', 'ProgramData':
 'C:\\ProgramData', 'SHLVL': '2', 'CommonProgramFiles(x86)': 'C:\\Program
 Files (x86)\\Common Files', 'PROCESSOR_ARCHITECTURE': 'AMD64',
 'ALLUSERSPROFILE': 'C:\\ProgramData', '!::': '::\\', 'LOCALAPPDATA':
 'C:\\Users\\buildbot\\AppData\\Local', 'HOMEPATH': '\\Users\\buildbot',
 'USERDOMAIN_ROAMINGPROFILE': 'WIN10-BB-01', 'VS120COMNTOOLS': 'C:\\Program
 Files (x86)\\Microsoft Visual Studio 12.0\\Common7\\Tools\\',
 'PROFILEREAD': 'true', 'ProgramFiles(x86)': 'C:\\Program Files (x86)',
 'ProgramW6432': 'C:\\Program Files', 'USERNAME': 'buildbot',
 'LOGONSERVER': '\\\\WIN10-BB-01', 'SYSTEMROOT': 'C:\\Windows', 'COMSPEC':
 'C:\\Windows\\system32\\cmd.exe', 'WINDIR': 'C:\\Windows', 'VIRTUAL_ENV':
 '/home/buildbot/slave-state/cygwin', 'PRINTER': 'Microsoft Print to PDF',
 'SESSIONNAME': 'RDP-Tcp#11', 'PATHEXT':
 '.COM;.EXE;.BAT;.CMD;.VBS;.VBE;.JS;.JSE;.WSF;.WSH;.MSC;.PY',
 'CommonProgramW6432': 'C:\\Program Files\\Common Files', 'CLIENTNAME':
 'dank-dell3620', 'COMMONPROGRAMFILES': 'C:\\Program Files\\Common Files',
 'APPDATA': 'C:\\Users\\buildbot\\AppData\\Roaming', 'HOMEDRIVE': 'C:',
 'OB_DUMPER_HOST': 'git.oblong.com', 'SYSTEMDRIVE': 'C:', 'HOSTNAME':
 'win10-bb-01', 'PUBLIC': 'C:\\Users\\Public', 'NUMBER_OF_PROCESSORS': '1',
 'PWD': '/home/buildbot/slave-state/cygwin/g-speak-win10-bb-01/yovo-win10
 -iz-trybuilder/build', 'PROCESSOR_LEVEL': '6', 'CCACHE_DIR':
 '/home/buildbot/slave-state/cygwin/g-speak-win10-bb-01/ccache.dir',
 'USERPROFILE': 'C:\\Users\\buildbot', 'OS': 'Windows_NT', 'OLDPWD':
 '/home/buildbot/src/ob-repobot', 'PROGRAMFILES': 'C:\\Program Files'}
 2017-03-09 13:14:19-0800 [Broker,client]   writing 122297 bytes to stdin
 2017-03-09 13:14:19-0800 [Broker,client]   using PTY: False
 }}}

 Attaching to the cygwin python process with cygwin strace shows an
 infinite quick repeat of

 {{{
 --- Process 8232, exception c0000005 at 0000000180042C51
    55 2120509 [flasio] python 9276 exception::handle: In
 cygwin_except_handler exception 0xC0000005 at 0x180042C51 sp 0x275CB60
    32 2120541 [flasio] python 9276 exception::handle: In
 cygwin_except_handler signal 11 at 0x180042C51
    72 2120613 [flasio] python 9276 _cygtls::inside_kernel: pc 0x180042C51,
 h 0x180040000, inside_kernel 0
    40 2120653 [flasio] python 9276 seterrno_from_nt_status:
 /home/corinna/src/cygwin/cygwin-2.6.0/cygwin-2.6.0-1.x86_64/src/newlib-
 cygwin/winsup/cygwin/fhandler_mailslot.cc:132 status 0xC0000034 -> windows
 error 2
    30 2120683 [flasio] python 9276 geterrno_from_win_error: windows error
 2 == errno 2
    30 2120713 [flasio] python 9276 sig_send: sendsig 0xC4, pid 9276,
 signal 11, its_me 1
    28 2120741 [flasio] python 9276 sig_send: wakeup 0x42C
    32 2120773 [sig] python 9276 sigpacket::process: signal 11 processing
    19 2120792 [sig] python 9276 sigpacket::process: signal 11 blocked
    18 2120810 [sig] python 9276 sigpacket::process: returning -1
    22 2120832 [sig] python 9276 wait_sig: signalling pack.wakeup 0x42C
    24 2120856 [flasio] python 9276 sig_send: Waiting for pack.wakeup 0x42C
    30 2120886 [flasio] python 9276 sig_send: returning 0x0 from sending
 signal 11
 }}}

 If I wait long enough, the 1200 second timeout kicks in, the slave
 terminates,
 and the waterfall shows the job failed with an interrupt
 {{{
 [Failure instance: Traceback (failure with no frames): <class
 'twisted.internet.error.ConnectionLost'>: Connection to the other side was
 lost in a non-clean fashion.
 ]
 }}}
 Starting the slave again retries the build
 as expected, and usually succeeds.

 Cygwin used to have a problem with child reaping,
 https://cygwin.com/ml/cygwin/2006-02/msg00849.html
 but I haven't tried sending a CHLD signal yet to python to see if it
 then notices that its git child has terminated.

 Next time it happens I'll try to get more info.

--
Ticket URL: <http://trac.buildbot.net/ticket/3670>
Buildbot <http://buildbot.net/>
Buildbot: build/test automation


More information about the bugs mailing list