[Buildbot-devel] retries being ignored when mode=copy for Git

Scott Garman sgarman at zenlinux.com
Wed Jun 9 00:17:58 UTC 2010


Hello,

I'm regularly run into issues with the git servers I clone our sources 
from. This results in the "fatal: The remote end hung up unexpectedly" 
error that I'm sure you've all seen before.

I'm running Buildbot 0.8.0 and was pleased to discover the retry option, 
which is intended to retry SCM checkouts when things like the above 
happen. I'm using it as follows in my master.cfg:

factory.addStep(Git(repourl="my_git_repo_url", mode="copy", 
timeout=10000, retry=(5, 3)))

...in order to retry the git clone up to three times with a 5-second 
interval between retries.

However, in my build configurations that use mode="copy" (as above), the 
retries are not attempted when a "remote end hung up unexpectedly" error 
occurs.

I've been looking through buildbot/slave/commands/vcs.py to understand 
what's going on, and admit I'm a bit uncertain where the control flow is 
happening due to the inheritance between the SourceBase and Git classes, 
so bear with me.

It appears that the following is happening:

Git's _doFetch() is invoked, which sets up a git fetch command, that 
gets passed along to Git's _dovccmd(). When the command fails, I get the 
following in the buildslave's log:

2010-06-08 16:48:05-0700 [-] command finished with signal None, exit 
code 128, elapsedTime: 0.189545
2010-06-08 16:48:05-0700 [-] _checkAbandoned [Failure instance: 
Traceback: <class 'buildbot.slave.commands.base.AbandonChain'>: 128
 
/raid0/pokybuild/lib/python2.6/site-packages/buildbot-0.8.0-py2.6.egg/buildbot/slave/commands/base.py:170:processEnded
 
/raid0/pokybuild/lib/python2.6/site-packages/buildbot-0.8.0-py2.6.egg/buildbot/slave/commands/base.py:700:finished
 
/usr/local/lib/python2.6/dist-packages/Twisted-10.0.0-py2.6-linux-x86_64.egg/twisted/internet/defer.py:280:callback
 
/usr/local/lib/python2.6/dist-packages/Twisted-10.0.0-py2.6-linux-x86_64.egg/twisted/internet/defer.py:354:_startRunCallbacks
         --- <exception caught here> ---
 
/usr/local/lib/python2.6/dist-packages/Twisted-10.0.0-py2.6-linux-x86_64.egg/twisted/internet/defer.py:371:_runCallbacks
 
/raid0/pokybuild/lib/python2.6/site-packages/buildbot-0.8.0-py2.6.egg/buildbot/slave/commands/base.py:948:_abandonOnFailure
         ]

It looks like within Git's _dovccmd(), the _abandonOnFailure callback 
gets added, which I can comment out and get the retrying behavior to 
work, but not until after a _didFetch() gets run with a missing revision 
option (and so git will try to do a reset --hard FETCH_HEAD). That 
behavior seems broken and so I'm reluctant to call that a workaround.

I'm hoping one of the more experienced developers can get to the bottom 
of this issue more quickly given the above information.

Thanks for your time,

Scott

-- 
Scott Garman
sgarman at zenlinux dot com




More information about the devel mailing list