[Buildbot-devel] multi-repo support: repo URLs

Tue Apr 13 09:05:11 UTC 2010

>>> As said, I think the fully qualified repo URL should be part of a change, 
>>> since together with branch/revision, it uniquely identifies the change. 
>> 
>> Agreed, and this bears repeating :) 
>>> This means e.g. that the sourcestamp stored in the database does not contain the 
>>> fully qualified repo URL. 
>> 
>> I see this as a problem, too. 
>> 
>>> If the hook doesn’t send the fully qualified URL, then I think the right place 
>>> to derive the fully qualified URL is not the build step, but the change source. 
>> 
>> We should never be putting substrings into the repository property. 
>> If a Change or SourceStamp has a nonempty repository attribute, it 
>> should be a complete pointer to the code.  Again, you can do whatever 
>> you want in the privacy of your project, but in terms of Buildbot's 
>> design, let's keep it simple. 
>> 
>I don't know what to think about that. We all agree that this 
>information helps us making the SourceStamp unique. Only unique, in 
>which context ? 
>My understanding is that it should be enough to describe a Change within 
>your current namespace. In a corporate environment, for instance, 
>according to me, the substring is enough. However, in a public project, 
>it might make sense to have the full path there, but then, what happen 
>if you move the repository ? 

I've been thinking about repository identity a little bit.

In particular: whats the identity of a Git repo?

The fully qualified URL? But then, I can make a exact copy or just a symbolic link
in file-system and the repo is still "the same".

Two repos are identical if they contain the exact same set of changes? So this allows
for still diffs in refs and reflog and other stuff.

Or: two repos are "identical" _with respect to a change_, if they both contain all
stuff to do a checkout of the change.

For Git:

Given a commit C with SHA1 SC, two repos R1 and R2 are "identical" (better
equivalent) to each other w.r.t. C, iff I can do a "git checkout SC" in both or
none of R1 and R2.

Two repos R1 and R2 are "equivalent", iff they are "equivalent" for any change C.

Thus, repo equivalence would be a relation on the set of repos defined as a
function of change.

>>> As said, I think the fully qualified repo URL should be part of a change, 
>>> since together with branch/revision, it uniquely identifies the change.

In the light of above, I would reformulate this from a Git POV for higher
precision:

A sourcestamp should uniquely identify a source tree.

For Git, _either_ of the following is sufficient:

a) a SHA1 of a commit
b) an object ref (branch or lightweight tag) + fully qualified repo URL

Note, that b) can and usually will change over time. So it is "unique" only
for a given point in time (when you resolve the objref to a commit SHA1).

So if I have on server side a complete (repo-crossing) directory of commit SHA1s
with information on which repos contain that commit, the commit SHA1 is all I
need to request a build. BuildBot could fire a SourceStep providing _any_
repo that can do a checkout of that commit and thus get a working tree.

If one uses "mode=clobber", this should be straightforward. If one wants
to optimize (mode=update, no full checkout), things are trickier.

>> The problem that you and Ben are addressing is that the origin of the 
>> Change may not be the best place to get the code, especially in a DVCS 
>> situation.  Concretely, if I commit to a local git repo, the 
>> repository string 
>> 
>>   /home/dustin/projects/amanda/ 
>> 
>> is not particularly helpful on some other system.  So the issue is one 
>> of recognizing "equivalent" repositories, and perhaps substituting a 
>> canonical repository for its equivalents. 

See above: 2 repos are equivalent w.r.t. a commit, iff they can checkout
the tree specified in the commit.

>In your case, in your environment (defined in our case via your buildbot 
>config file), I guess "amanda" uniquely defines your repository. My 
>advice would be for the ChangeSource step to only return "amanda" in you 
>case, as this is the only value making sense for your master. Let the 
>master then add his knowledge to this value, making it absolute within 
>your current config. 
> > *This* can be done at the 
>> ChangeSource level, but I think that we will quickly find 
>> %-substitution too limiting. 
>The doc you linked to in a previous mail has some more evolved options: 
>you can give him a callable for instance, and in that case do the magic 
>you want. It is still limited to the repository path though. 
> > Rather, let's add a "canonicalize" 
>> function to the changesources that can perform whatever munging is 
>> approrpriate (and not just on the repository). 
>> 

"Name equivalence" in a strict sense is void, since it puts in "magic"
knowledge of what repos "usually" contain some change.

>If I get it correctly, you want to move the modifications we (Marcus and 
>I) made lastly into an earlier stage: move them from the webstatus and 
>the source step to the ChangeSource step. No objections from my part as 
>long as it stays as flexible as we made it.