[Buildbot-devel] copying information between steps or schedulers

Tue Mar 14 12:08:17 UTC 2006

Hello,

Brian Warner wrote:
> For this purpose, and some others which have been raised on the list before,
> I'm thinking that some sort of general-purpose "build attributes" might be
> appropriate. These would be a set of key-value pairs made available to all
> Steps, and persisted in the long-term BuildStatus object to be made available
> to status plugins. I like the "%(attrname)s" substitution syntax suggested

The syntax is not my invention, it is borrowed from string formatting in 
Python. Note that it is not necessary to restrict yourself to such syntax; you 
could eg have a key-value pair

( "%(revision)", "12345" )

and do a literal search/replace of the keys by their values.
In this way, a user can choose arbitrary syntax for his keys.

> 
>  s(ShellCommand, args=["mv", "source.tar.gz",
>                        SubstituteAttrs("build-r%(revision)s.tar.gz")])
> 
> or maybe
> 
>  s(ShellCommand, args=["mv", "source.tar.gz",
>                        SubstituteAttrs("build-r%ss.tar.gz", "revision")])
> 
> 
> I think that you'd still have to write some code to set these attributes, but

I don't understand this line. Do you mean writing some Python derived class?
If so, how is the connection made from my make/sh/perl/whatever script to this 
Python code?
In a previous mail I suggested to parse stdout for lines of a certain form 
(defined as RE), and extract key/value pairs from them.
The difficulty here is that parsing stdout can result in unexpected matches if 
one is not careful.

On the other hand, having an explicit definition of what to match does make 
the program more transparent (instead of hiding such magic in a derived class).

A problem related to the above approach (%s substitution and stdout parsing) 
is that information is limited to strings only as far as I can see, ie I don't 
know how to pass on a list of files as argument for example, without adding 
complicated pattern-expansions.

The example below is about a list of libraryfiles that I want to add to a 
command, but it applies equally bad to a list of changed files of a commit.

Example:

attributes = { 'libfiles': [ 'file1', 'file2', 'file3' ] }

ShellCommand(command=["./mycommand", "--lib=%(libfiles)s"])

wanted output:

./mycommand --lib=file1 --lib=file2 --lib=file3

I believe there is not enough information in the input to get the wanted 
output. I also don't see how extend this %s approach for such cases without 
introducing ad hoc special solutions (ie one can extend %s replacement to 
behave as wanted in the example, but then how does one state to want 
"./mycommand --linb=file1,file2,file3" for example?).

An other approach may be to use a file with key/value pairs (eg one key/value 
at each line), or have a directory of files, each for a different key.
Since reading/writing files is a quite primitive operation supported by all 
build-related programs, this approach may be more generally applicable.
Also, there are little inherent restrictions to the contents of key/value pairs.

If this (set of) attributes is kept in sync between slaves in some way, you 
get file transfer for free (or, you implement file-transfer and you get 
attribute transfer for free :-) ).

Last but not least, if you give the webserver access to this set of files, one 
can serve files and inspect state (although I do not know whether that is 
something one'd want to do).

> what exactly should be checked out. (and for forced builds of HEAD, the
> buildbot never actually learns what revision is used, unless we add some code
> to parse the output of the checkout/update step somehow).

pySVN is your friend :-)
(SVN is designed to have an API available so it is very easy to build scripts 
that do things with a repository, pySVN is the Python version of that API).

That reduces the problem to convincing the entire world that SVN is the 
solution to SCM problems....

Just kidding, ;-)

> In general, I'm of two minds on this sort of thing. When I'm wearing my
> "Continuous Integration" hat, I want to see the buildbot know as little as
> possible about the build process. In my mind, developers should only need to
> know how to do "make all; make test", or something equivalent, and the rest
> of the intelligence should be built into the tree somewhere, in the top-level
> Makefile, or scons script, or Ant recipe, or whatever. This is important
> because if it isn't checked in, it will be hard to keep in sync, hard to
> maintain, and eventually forgotten. From this point of view, the buildbot
> should be as dumb as possible.

I don't think I agree with the conclusion of your final sentence.
If 'make all; make test' would be enough, we wouldn't need buildbot.

What is missing here imho is the project and site configuration.
'make test' is designed to run tests in the general sense, but it assumes 
things about its environment, eg where are the tests, how do I run them, how 
is it indicated when a test fails, etc.
All such things are (I think) part of the project configuration. Other stuff 
is the SCM used, organization of the repository, release policies, issue 
tracking, etc. I do agree with you that such information should be managed as 
part of the project code, eg by adding it to the repository.

Site configuration is about the structure of the site, like which machines 
exist, where is what running, what user accounts are used, how is the file 
system organized, etc.
Such site configuration is a one-time-only use, so where the information is 
kept is not terribly important, unless the site is very big or there are a lot 
of sites that need to be maintained, but in the latter case we are way off 
topic on this list...

To make 'make test' work, a lot of these configuration settings need to be 
decided. To make it work automagically using buildbot, buildbot needs to be 
informed about these decisions such that it fits in the project and site 
configuration.

Imho buildbot should not be dumb, it should be smart to allow easy entry of 
site configuration, and flexible to obtain project configuration from the 
project data, although I am worried that the latter is not always feasible.

> On the other hand, when I'm wearing my "Release Engineer" hat, I want to see
> the buildbot help with release automation, which frequently involves more
> complicated steps that those needed by mere developers. If the buildbot is a
> useful place to build installers/.deb-packages, upload them to a distribution
> directory, rebuild docs, update a website.. then sure, go for it.
> 
> So these two views are sometimes in conflict.

I don't think so. A release engineer is almost by definition much more 
involved in site configuration and much less in project configuration imho, 
hence the shift in needed information.

> complicated features to the buildbot, to enable the automation features, but
> I'm also afraid that if they are available, people will be tempted to put the
> build intelligence in the buildbot's config, where it is generally invisible
> to developers who may need to do the same thing too.

I am not sure that this is a buildbot development problem. Preventing people 
from doing stupid things should not be a project goal imho. I much more 
believe that people should have access to powerful primitives and a very good 
guide how to use them effectively (and in a lot of cases, the latter is the 
most difficult).
Sure, there will be people that ignore any advice and will try and succeed to 
shoot themselves in the foot, but hey, if you want to spend your life that way....

> Anyway, the build attributes are an example of a feature that I'm inclined to
> put in, but am still concerned that it might be a bad idea in the long run.

I don't understand the core problem that we try to solve here. I experience a 
need to communicate information between steps in a build and between different 
builds (from an upstream scheduler to a downstream scheduler), but is that 
really the problem or just an symptom of something larger?

I tend to think the latter, but I cannot put my finger on it.

Maybe an implicit assumption in buildbot is that it is desgined for doing "svn 
export; make; make test" at all slaves?
[ Sorry if I offended you with this question, that is not my intention.

   I am just trying to figure out what buildbot rule I am breaking that causes
   my need for communication and file transfer. Am I trying to do something
   that I should not do? Is my project setup broken? (I would not be surprised)
]

> The other feature I'm considering that falls into this category is a pair of
> BuildSteps that move files between the master and the slave. The syntax would
> probably be something like:

If you use a (set of) files to contain the attributes, and you keep these in 
sync between slaves, you get this implicit.

The trouble with files is that you immediately also get the file management 
problem, ie I need to implement procedures that remove files every now and 
then otherwise I run out of disk space. At this moment that is covered 
elsewhere, if you include file stuff in buildbot there will be a need for 
cleaning up (in the form of another 'build'?).

> where all filenames are relative to the master/slave's respective basedirs.
> It would probably be useful to add a third command:
> 
>  s(MasterShellCommand, args=["scp", "foo.tar.gz", "website:latest.tar.gz"])
> 
> so you could actually do something with the file once you'd copied it to the
> master.

Not sure. why bother to copy it to the master then in the first place?
(wouldn't it be equally easy to copy the file from the last slave?)

I'd probably consider publishing files to be done using another slave-build. 
This slave may run at the same machine as the buildmaster (if you consider 
copying a file twice as something that should be avoided).

> thoughts?

Plenty, just not very organized, I am afraid.

Albert