[Buildbot] #3247: Timeout on slave can happen too fast

Buildbot trac trac at buildbot.net
Mon Apr 13 07:29:26 UTC 2015


#3247: Timeout on slave can happen too fast
----------------------+-----------------------
Reporter:  kayhayen   |      Owner:
    Type:  undecided  |     Status:  new
Priority:  major      |  Milestone:  undecided
 Version:  0.8.10     |   Keywords:
----------------------+-----------------------
 Hello,

 lately I had a couple of builds fail due to unjustified timeouts. The
 steps that have it, are created to have a timeout of 3600, and yet time
 out, much faster, even for a system not yet running 3600 seconds. This
 happens on Windows only.

 The are created like this:

     factory.addStep(
         ShellCommand(
             env     = {
                 "PYTHON" : python
             },
             command = command,
             timeout = 3600
         )
     )

 The step then says for the job:

 Timing:

 Start   Mon Apr 13 08:49:09 2015
 End     Mon Apr 13 09:07:16 2015
 Elapsed 18 mins, 7 secs

 The step in question fails with this:

 x86 'C:\Python27_32\python.exe tests/standalone/run_all.py ...' failed ( 1
 mins, 38 secs )

 Notice, it really fails fast. Overall timing is like this:

 git update ( 19 secs )
 stdio
 x86 'C:\Python27_32\python.exe tests/standalone/run_all.py ...' failed ( 1
 mins, 38 secs )
 stdio
 amd64 'C:\Python27_64\python.exe tests/standalone/run_all.py ...' ( 8
 mins, 45 secs )
 stdio
 amd64_debug 'C:\Python27_64\python.exe tests/standalone/run_all.py ...' (
 7 mins, 23 secs )
 stdio

 And in the log file, it says:

 Using concrete python 2.7.8 on x86
 Consider output of recursively compiled program: CtypesUsing.py
 Comparing output of 'CtypesUsing.py' using 'C:\Python27_32\python.exe'
 with flags silent, expect_success, standalone, remove_output, no_site ...

 command timed out: 3600 seconds without output running
 ['C:\\Python27_32\\python.exe', 'tests/standalone/run_all.py', 'search'],
 attempting to kill
 program finished with exit code 1
 elapsedTime=7297.885000


 The elapsed time must be wrong (seems it thinks 2 hours passed), the
 timeout was 1h, and the actual time about 1 minute 38 seconds. I find, it
 might be due to a time jump of the system clock, that however does not
 align with the start date. And the fact that the summer time change was a
 bit ago, and 1h. I live in CEST.

 In another instance, a step failed after 57 minutes, whereas again, the
 elapsed time is way higher, pretty exactly 2 hours more I suppose.

 command timed out: 3600 seconds without output running
 ['C:\\Python27_64\\python.exe', 'tests/CPython27/run_all.py', 'search'],
 attempting to kill
 program finished with exit code 1
 elapsedTime=10646.917000

 Again, the time of the build looks innocous:

 Start   Sat Apr 11 23:56:16 2015
 End     Sun Apr 12 03:01:18 2015
 Elapsed 3 hrs, 5 mins, 2 secs

 That is about right, these steps take that long:

 git update ( 9 secs )
 stdio
 x86 'C:\Python27_32\python.exe tests/CPython27/run_all.py ...' ( 1 hrs, 5
 mins, 38 secs )
 stdio
 amd64 'C:\Python27_64\python.exe tests/CPython27/run_all.py ...' ( 1 hrs,
 1 mins, 46 secs )
 stdio
 amd64_debug 'C:\Python27_64\python.exe tests/CPython27/run_all.py ...'
 failed ( 57 mins, 27 secs )
 stdio

 This is very strange. If there was any time jump, the time tacking of the
 builds and steps is not affected (maybe it happens on the master), but the
 step timeout and elapsed time report, it is done on the slave clock.

 Is there any way for a Windows slave to not become online and execute jobs
 before it is at least nearly time synchronized to the master?

--
Ticket URL: <http://trac.buildbot.net/ticket/3247>
Buildbot <http://buildbot.net/>
Buildbot: build/test automation


More information about the bugs mailing list