[Buildbot-devel] nagios buildbot plugins

Sun Jan 20 01:41:04 UTC 2008

Hey,

We'd like to monitor our buildbot masters from nagios.

Nagios plugins are very simple, you can call any arbitrary binary or script
(e.g. "check_load_avg", optionally with arguments e.g. what's an acceptable
threshhold) and it sets different error codes for WARN, CRITICAL, or OK
(also optionlly prints a message as well). It's what this script does to
gather the info that's probably more interesting to this list, though.

We already monitor whether Buildbot is running on the master and slaves,
available disk space, etc. What we'd like to be able  to monitor is internal
master stuff like:

* is the queue unacceptably long for any builders
* are any slaves disconnected
* general health of the master (responsive, various queues within normal
range, etc.)
* number of concurrent connections to the master

I'm thinking that debug or manhole would be good for this. Before I go and
write some plugins, does anyone else do this, or have any suggestions? I'm
totally expecting that this is going to have to track Buildbot, and will
likely break between releases, as it's going to depend on internal API.

If we end up writing something new, I'll likely just have one python script
that can be symlinked to different names ("check_buildbot_queue",
"check_buildbot_slaves", etc.). It could be useful for people using things
other than Nagios, because it can be run standalone too. Maybe something
good for contrib?

Thanks,
Rob
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://buildbot.net/pipermail/devel/attachments/20080119/72146a2f/attachment.html>