[Buildbot-devel] Discussion on source steps

Dustin J. Mitchell dustin at v.igoro.us
Sat Jun 11 19:46:10 UTC 2011


On Mon, Jun 6, 2011 at 1:38 PM, Dmitry Nezhevenko <dion at dion.org.ua> wrote:
> This one looks very interesting for me. But maybe even better. Something like
> script-based VCS command that calls custom helper scripts for every
> particular action:
> - Cleanup
> - Update to specific branch and revision.

This would be super-cool, and give people a great way to implement
their own insane checkout procedures without having to write custom
code on the buildmaster.  I'm envisioning a factory that runs a
FileDownload step to put the latest and greatest version of the script
on the slave, and then runs the ScriptSource step to interface with
it.

> Another VCS-related issue I'm experiencing is some kind of timeout
> handling. Usually VCS operations depends on network or any other factors
> that may fail. In case of network-related issues (like temporary 30
> seconds downtime) it sounds reasonable to not fail source step and just
> wait a bit and retry. At the same time not every VCS error should cause
> retry. Like "unable to authenticate" error or "repository not found" has
> no changes to recover.
>
> Currently buildbot doesn't provide any way to handle this some "standard"
> way and it's up to slave command to handle everything. I'm doing
> everything on slave by reading CLI client exit code and sometimes parsing
> of it's stderr.
>
> So since you are trying to refactor VCS stuff, maybe it's good idea to
> think about this.

I agree - one of the most common problems with builders at Mozilla is
failure to download.  It doesn't help that we're cloning a 250MB
repository, and since we're using hg that's bringing the entire
history with it.  Let's all honor the sacrifice of hg.mozilla.org (and
build some read-only mirrors for the buildfarm..).

Anyway, most (all?) VCSes support a retry parameter right now, which
specifies how many times to retry, and how long to wait between those
retries.  I'm using this on the metabuildbot because some of the
donated slaves have a hard time keeping a connection to github
running, but after 5 retries they almost always manage to get it
working.

Dustin




More information about the devel mailing list