[Buildbot] #3247: Timeout on slave can happen too fast
Buildbot trac
trac at buildbot.net
Mon Apr 13 07:29:26 UTC 2015
#3247: Timeout on slave can happen too fast
----------------------+-----------------------
Reporter: kayhayen | Owner:
Type: undecided | Status: new
Priority: major | Milestone: undecided
Version: 0.8.10 | Keywords:
----------------------+-----------------------
Hello,
lately I had a couple of builds fail due to unjustified timeouts. The
steps that have it, are created to have a timeout of 3600, and yet time
out, much faster, even for a system not yet running 3600 seconds. This
happens on Windows only.
The are created like this:
factory.addStep(
ShellCommand(
env = {
"PYTHON" : python
},
command = command,
timeout = 3600
)
)
The step then says for the job:
Timing:
Start Mon Apr 13 08:49:09 2015
End Mon Apr 13 09:07:16 2015
Elapsed 18 mins, 7 secs
The step in question fails with this:
x86 'C:\Python27_32\python.exe tests/standalone/run_all.py ...' failed ( 1
mins, 38 secs )
Notice, it really fails fast. Overall timing is like this:
git update ( 19 secs )
stdio
x86 'C:\Python27_32\python.exe tests/standalone/run_all.py ...' failed ( 1
mins, 38 secs )
stdio
amd64 'C:\Python27_64\python.exe tests/standalone/run_all.py ...' ( 8
mins, 45 secs )
stdio
amd64_debug 'C:\Python27_64\python.exe tests/standalone/run_all.py ...' (
7 mins, 23 secs )
stdio
And in the log file, it says:
Using concrete python 2.7.8 on x86
Consider output of recursively compiled program: CtypesUsing.py
Comparing output of 'CtypesUsing.py' using 'C:\Python27_32\python.exe'
with flags silent, expect_success, standalone, remove_output, no_site ...
command timed out: 3600 seconds without output running
['C:\\Python27_32\\python.exe', 'tests/standalone/run_all.py', 'search'],
attempting to kill
program finished with exit code 1
elapsedTime=7297.885000
The elapsed time must be wrong (seems it thinks 2 hours passed), the
timeout was 1h, and the actual time about 1 minute 38 seconds. I find, it
might be due to a time jump of the system clock, that however does not
align with the start date. And the fact that the summer time change was a
bit ago, and 1h. I live in CEST.
In another instance, a step failed after 57 minutes, whereas again, the
elapsed time is way higher, pretty exactly 2 hours more I suppose.
command timed out: 3600 seconds without output running
['C:\\Python27_64\\python.exe', 'tests/CPython27/run_all.py', 'search'],
attempting to kill
program finished with exit code 1
elapsedTime=10646.917000
Again, the time of the build looks innocous:
Start Sat Apr 11 23:56:16 2015
End Sun Apr 12 03:01:18 2015
Elapsed 3 hrs, 5 mins, 2 secs
That is about right, these steps take that long:
git update ( 9 secs )
stdio
x86 'C:\Python27_32\python.exe tests/CPython27/run_all.py ...' ( 1 hrs, 5
mins, 38 secs )
stdio
amd64 'C:\Python27_64\python.exe tests/CPython27/run_all.py ...' ( 1 hrs,
1 mins, 46 secs )
stdio
amd64_debug 'C:\Python27_64\python.exe tests/CPython27/run_all.py ...'
failed ( 57 mins, 27 secs )
stdio
This is very strange. If there was any time jump, the time tacking of the
builds and steps is not affected (maybe it happens on the master), but the
step timeout and elapsed time report, it is done on the slave clock.
Is there any way for a Windows slave to not become online and execute jobs
before it is at least nearly time synchronized to the master?
--
Ticket URL: <http://trac.buildbot.net/ticket/3247>
Buildbot <http://buildbot.net/>
Buildbot: build/test automation
More information about the bugs
mailing list