[Buildbot-devel] retries being ignored when mode=copy for Git
Scott Garman
sgarman at zenlinux.com
Wed Jun 9 00:17:58 UTC 2010
Hello,
I'm regularly run into issues with the git servers I clone our sources
from. This results in the "fatal: The remote end hung up unexpectedly"
error that I'm sure you've all seen before.
I'm running Buildbot 0.8.0 and was pleased to discover the retry option,
which is intended to retry SCM checkouts when things like the above
happen. I'm using it as follows in my master.cfg:
factory.addStep(Git(repourl="my_git_repo_url", mode="copy",
timeout=10000, retry=(5, 3)))
...in order to retry the git clone up to three times with a 5-second
interval between retries.
However, in my build configurations that use mode="copy" (as above), the
retries are not attempted when a "remote end hung up unexpectedly" error
occurs.
I've been looking through buildbot/slave/commands/vcs.py to understand
what's going on, and admit I'm a bit uncertain where the control flow is
happening due to the inheritance between the SourceBase and Git classes,
so bear with me.
It appears that the following is happening:
Git's _doFetch() is invoked, which sets up a git fetch command, that
gets passed along to Git's _dovccmd(). When the command fails, I get the
following in the buildslave's log:
2010-06-08 16:48:05-0700 [-] command finished with signal None, exit
code 128, elapsedTime: 0.189545
2010-06-08 16:48:05-0700 [-] _checkAbandoned [Failure instance:
Traceback: <class 'buildbot.slave.commands.base.AbandonChain'>: 128
/raid0/pokybuild/lib/python2.6/site-packages/buildbot-0.8.0-py2.6.egg/buildbot/slave/commands/base.py:170:processEnded
/raid0/pokybuild/lib/python2.6/site-packages/buildbot-0.8.0-py2.6.egg/buildbot/slave/commands/base.py:700:finished
/usr/local/lib/python2.6/dist-packages/Twisted-10.0.0-py2.6-linux-x86_64.egg/twisted/internet/defer.py:280:callback
/usr/local/lib/python2.6/dist-packages/Twisted-10.0.0-py2.6-linux-x86_64.egg/twisted/internet/defer.py:354:_startRunCallbacks
--- <exception caught here> ---
/usr/local/lib/python2.6/dist-packages/Twisted-10.0.0-py2.6-linux-x86_64.egg/twisted/internet/defer.py:371:_runCallbacks
/raid0/pokybuild/lib/python2.6/site-packages/buildbot-0.8.0-py2.6.egg/buildbot/slave/commands/base.py:948:_abandonOnFailure
]
It looks like within Git's _dovccmd(), the _abandonOnFailure callback
gets added, which I can comment out and get the retrying behavior to
work, but not until after a _didFetch() gets run with a missing revision
option (and so git will try to do a reset --hard FETCH_HEAD). That
behavior seems broken and so I'm reluctant to call that a workaround.
I'm hoping one of the more experienced developers can get to the bottom
of this issue more quickly given the above information.
Thanks for your time,
Scott
--
Scott Garman
sgarman at zenlinux dot com
More information about the devel
mailing list