[Buildbot-devel] Confused about adding a LogObserver

Thu Aug 24 22:37:43 UTC 2006

I think some of the confusion stems from the fact that LogObservers and the
new logfiles= argument are pretty different beasts but with similar names.
Worse yet, there are also LogFiles (i.e. instances of
buildbot.status.builder.LogFile). logfiles= can put text into LogFiles, and
LogObservers can get text from logfiles=, and LogObservers can put text into
LogFiles. (at least one of these could use a different name, but they're all
pretty accurate descriptions of what they do, so I was unable to come up with
a better one when I wrote them).

logfiles= is there to watch files (actual, physical files on disk, like
/var/log/syslog) on the buildslave-side that grow during a build. Each
RemoteCommand sends a number of status messages as it runs, most of these
status messages are "log" messages for some channel. By default the only log
messages it sends are for "stdio", but by adding logfiles= you can ask it to
send log messages for other things. logfiles= maps these channel names to
slave-side files. These slave-side files are checked once per second, and any
new contents are sent to the buildmaster in a log message that is tagged by
the channel name.

When you set logfiles=, it also tells the master-side BuildStep to create
buildbot.status.builder.LogFile instances where these log messages will be
deposited. These LogFile instances are a status thing: each BuildStep has a
collection of them, they are available to the different status plugins, they
get displayed on the waterfall page, etc. As the log messages arrive, their
channel tag indicates which LogFile they should be deposited into.

LogObservers, on the other hand, are entities on the master side that can
subscribe to one of these LogFile instances and receive log messages as they
arrive. The LogObserver can do whatever it wants in response to these
messages: perhaps counting them, perhaps parsing them somehow. It might add
some text to an output LogFile, perhaps a processed version of the text that
arrived from the input LogFile. Creating a LogObserver and subscribing it to
an input LogFile only helps if something is putting the messages you want
into that LogFile.

As an example (which I believe is equivalent to your goals.. let me know if
I've misinterpreted them), let's start with a simple compile step and build
up from there:

=== 1

f.addStep(Compile, command=["make", "all"])

that gives us a build which has a single LogFile named "stdio", the same
LogFile that all ShellCommands get.

=== 2

class BetterCompile(Compile):
    command = ["make", "all"]

    def createSummary(self, log):
        # 'log' is the stdio LogFile, there might be others
        text = StringIO(log.getText())
        warnings = [line for line in text.readlines()
                    if line.startswith("warning:")]
        self.addCompleteLog('warnings', "".join(warnings))

f.addStep(BetterCompile)

In this build, when the build has finished, our createSummary method will get
called, it will extract all the stdout/stderr text, turn it into lines, look
for the ones with "warning:", assemble those into a list, and join the list
into a big chunk of text. Then it will create a new LogFile, fully formed,
with that text, and add it to this step's status. The LogFile will be named
"warnings", and a link to it (called "warnings") will appear on the Waterfall
display in this buildstep's box. In addition, the MailNotifier will include a
copy of this warnings log as an attachment when it sends out mail about the
build. Other status plugins can get at the "warnings" log just like they get
to the "stdio" log.

We wind up with two LogFiles, one named "stdio", the other named "warnings".

(note that the docstring for createSummary() suggests some code which
wouldn't actually run.. python has no top-level 'grep' function. I think I
was experiencing a flashback to my perl days when I wrote that..)

This is a bit unsatisfying, because we have to wait until the very end of the
compile before we get to see any warnings at all.

=== 3   (untested but should work with buildbot-0.7.4)

class WarningsGrepper(step.LogLineObserver):
    def __init__(self, output):
        step.LogLineObserver.__init__(self)
        self.output = output

    def outLineReceived(self, line):
        if line.startswith("warning:"):
            self.output.addStdout(line)

    def logFinished(self):
        self.output.finish()

class EvenBetterCompile(Compile)
    command = ["make", "all"]

    def setupEnvironment(self, cmd):
        # we could really use a better hook to put this in, but
        # this method happens to be run at a good time
        Compile.setupEnvironment(self, cmd) # always upcall!

        self.warningslog = self.addLog("warnings")
        grepper = WarningsGrepper(self.warningslog)
        self.addLogObserver('stdio', grepper)

    def commandComplete(self, cmd):
        # again, we could use a better place for this
        self.warningslog.logFinished()

f.addStep(EvenBetterCompile)

This example is a bit ugly, because I hadn't really thought everything
through when I implemented LogObservers. It looks like we need some better
hooks for doing this sort of thing.

The idea is that early in the life of the BuildStep, we want to create the
LogObserver. We do this in setupEnvironment, which is normally just a way to
get the environment variables passed to the buildslave, but it happens to be
run at the right time and thus is convenient for our purposes. __init__ would
work too, except that the arguments to ShellCommand are complicated and it is
a nuisance to get the upcall right. You could probably do something like

    def __init__(self, *args, **kwargs):
        Compile.__init_(self, *args, **kwargs)
        self.warningslog = self.addLog("warnings")
        ...

Anyways, in this setup stage, we create a new LogFile by calling
self.addLog(). This creates a target where our just-the-warnings messages can
go. By using addLog() instead of addCompleteLog(), we're saying that we will
be adding text to it over time, instead of all at once, and that we will take
responsibility for marking the LogFile as "finished" at some point in the
future.

We then create our LogObserver by instantiating WarningsGrepper. We give it a
reference to the output LogFile so it has a place to put the warnings that it
greps. We inherit from LogLineObserver because we want to hear about complete
lines of stdout rather than getting arbitrarily-broken-up chunks of text.

Then we arrange to feed our grepper by connecting it to the "stdio" log
channel, by calling addLogObserver. This tells the buildstep that whenever
some stdio arrives from the buildslave, it should be handed to our grepper
(in addition to anywhere else it might already have been going, such as the
usual "stdio" LogFile).

At this point, the chain looks like:

"stdio" log channel
  |
  +--->  grepper  ->  "warnings" LogFile
  |
  +---> "stdio" LogFile

Now, as the step runs, and stdout/stderr arrive from the slave, they will be
split up into lines, and delivered to the grepper's outLineReceived() method.
This method will submit a subset of those lines to the "warnings" LogFile by
calling the LogFile's .addStdout() method.

When the step finishes, it runs the commandComplete method. This is a
general-purpose hook where you can put any sort of post-command work you
want. In our case, we use it to tell the grepper that its input channel has
closed. It responds by closing the "warnings" LogFile that it was feeding, by
calling output.finish() . This is important, otherwise the LogFile will stay
"unfinished": one consequence would be that the HTML page that displays this
LogFile will show the contents and then keep the connection open (since it
doesn't know whether more text will arrive or not).

=== 4  (sketch of how this might be easier in a future version)

class WarningsGrepper(step.LogLineObserver):
    def __init__(self, output):
        step.LogLineObserver.__init__(self)
        self.output = output

    def outLineReceived(self, line):
        if line.startswith("warning:"):
            self.output.addStdout(line)

    def logFinished(self):
        self.output.finish()

class FutureBetterCompile(Compile)
    command = ["make", "all"]

    def startHook(self, cmd):
        warningslog = self.addLog("warnings")
        grepper = WarningsGrepper(warningslog)
        self.addLogObserver('stdio', grepper)

f.addStep(FutureBetterCompile)

Clearly we could use some better support for this. Some changes I'm thinking
we should make for the next release:

 LogObserver.logFinished() should be called automatically when the LogFile
 that's feeding them is finished. This removes the need for our
 commandComplete() to perform this notification.

 A new command-is-starting hook should be added, so that we don't have to
 override (and upcall) something unrelated like setupEnvironment.

For this kind of straightforward filtering process, it might be even better
to collapse this framework down to a single method call:

=== 5  (even further in the future)

class FutureBetterCompile2(Compile)
    command = ["make", "all"]

    def startHook(self, cmd):
        self.addLogLineFilter("stdio', "warnings",
                              lambda line: line.startswith("warning:"))

Where the idea is that addLogLineFilter() takes three parameters: the name of
the LogFile that you're using as an input, the name of the LogFile that you
want to create as output, and a callable that gets to look at each line and
decide whether or not it should go into the output LogFile.

===

So as far as I understand your goals, you don't need logfiles= at all. You
only need the logfiles= argument if there is a log file on the buildslave
side that you want to follow, something like /var/log/syslog or
_trial_temp/test.log or the like. If there is such a file, you could send its
contents into a basic LogFile, and/or you could subscribe a LogObserver to it
that could do more complicated processing. But if the only useful text coming
out of your remote command is coming from stdout/stderr, then you don't need
to use ShellCommand's logfiles= argument (or the corresponding .logfiles
attribute).

But if you want to do something with log contents as they arrive (including
stdio), then that's what LogObservers are for. When I wrote them I was only
thinking about using them to count events, but creating filtered LogFiles in
real-time is an obvious use (and much more useful application, to be honest),
so I'd like to add whatever hooks are necessary to make this process easier.

hope that helps.. let me know what I might put in the docs to make this more
clear.

cheers,
 -Brian