[Buildbot-devel] Summer Of Code projects

Sun May 7 01:21:18 UTC 2006

(I know I should have posted this like a week ago, sorry. There's still
another two days to get student proposals in, so better late than never..).

So the Python Software Foundation as a group is mentoring a number of Google
SoC projects, and they've expressed an interest in seeing some Buildbot
improvements as a part of that. The wiki page with project ideas is here:

 http://wiki.python.org/moin/SummerOfCode

I seem to have been accepted as a mentor for the SoC, so I'm encouraging all
interested students to submit Buildbot-related project proposals via the
links on that page.

The kinds of things I see as good summer-sized projects include:

* SQLifying the backend build-status database

 This would replace the current collection of directories and pickled Status
 instances with a proper database, specifically one which could be
 interrogated by external tools. I'd start with SQLite but it would be best
 if other databases (MySQL and Postgres come to mind) could be swapped in
 without too much effort. It has been suggested to me that divmod.axiom
 should be used for the backend. Some considerable thought needs to be put
 into getting the schema right, to make it useful to tool developers.

 The student that works on this should have some twisted, buildbot, and SQL
 skills, as well as familiarity with real-world buildbot deployment. The
 first week of the project will probably be to survey buildmaster admins to
 find out what kinds of questions they'd like to ask of their new buildbot
 database.

 I've attached my initial notes on this project below. I really don't know
 SQL, so take them with a grain of salt.

* Problem Tracking

 For a long time, I've wanted to be able to have the buildbot be aware of
 specific test failures, so it could answer questions like "which tests are
 failing intermittently?", and "when did test #3 start failing?". To
 accomplish this, we need fine-grained parsing of test results for a variety
 of test frameworks (starting with trial and expanding outwards) as well as
 generic compile results (grepping for the filename/linenumber information
 that gcc likes to emit and which emacs knows how to look for). Then we need
 a place in the build status data structure to save it. Then we need some
 tools to scan through these looking for "Problems", which are sequential
 failures of a single test, possibly across multiple builders. Each Problem
 is associated with some people and some Changes (the one which started it,
 the one which fixed it).

 This one might be easier to implement once the SQL stuff is in place, but I
 think a lot of it could be done beforehand even though it might not be
 particularly efficient. The test-result parsing stuff is independent.

In addition, there are a number of smaller projects which might be added
together to make a reasonable summer's worth of SoC work:

* Displaying Build Metrics

 One of the stated goals of the buildbot is to help you improve things like
 memory footprint, compile time, code size, test coverage, etc. However, so
 far I've managed none of these. I'd like to see a place in each build for
 various numerical properties to be stored (N.B. the 'build properties' that
 will be present in 0.7.3 are the right place for this), and then some web
 status pages that present graphs of these quantities over time.

 The front-end of this will be easier once the "Web Parts" framework is in
 place, which will make it possible to mix-and-match different web status
 pages instead of having just the big chronological Waterfall page.

* Detailed Test Coverage Display

 A number of languages offer tools to analyze which lines of code are
 exercised by the test suite and which are not. It would be very nice if the
 buildbot could interpret the results of these tools and present the
 information in a useful way. In addition to just the overall coverage
 percentage, the build status could link to a page (perhaps an external
 viewcvs-like page) where you could see each file and each line and whether
 it was covered or not.

* Better IM status clients

 At the moment we have an IRC bot which is almost purely reactive (it stays
 quiet until you ask it a question). In addition to that, I'd like to have
 active bots (which announce build results into a channel), and bots which do
 the same thing over other IM protocols (starting with AIM and probably
 including Jabber too). All these aspects should use the same code, of
 course. Part of the job would be to map the buildmaster's concept of "user"
 into an IM handle. The long-term goal is to tie this into Problem Tracking,
 using IM or email as necessary to inform the responsible user about the
 status of the Problems they are on the hook to fix.

Anyways, if you're a student and are looking to do some buildbot work this
summer, please consider applying. The web page above has a number of links to
get you started. Drop me a note if you have any questions about the projects
I've described or any other buildbot-related ones you might have an interest
in.

thanks,
 -Brian

-------------- next part --------------

The idea is to use SQLite to store build status.

c['storage'] = b.storage.SQLite(prune=30)
# enable SQLite instead of old-style, delete builds after 30 days

Each BuildStatus object needs to live in ram. When the BuildStatus is loaded,
a bunch of queries are performed to pull all the small things into memory at
once, so that things like IBuildStatus.getReason can run synchronously.

Certain methods are changed to return a Deferred:
 IBuilderStatus.getBuild, .getEvent
 IBuildStatus.getChanges
 TBD

Conversion: when the sqlite backend is created for the first time, a SlowJob
runs through all old builds and adds them to the database.

Schemata:

IBuildStatus:
 build_id = UNIQUE
 number: INT
 builder_id
 isFinished = BOOL
 reason = STRING
 sourcestamp:
  branch_id
  revision = STRING
  patched = BOOL
  patch_level = INT
  patch_diff = STRING
 #changes_id: mapped via BuildTimesChanges
 #responsibleUsers: mapped via ResponsibleUsers
 #interestedUsers: mapped via InterestedUsers
 #steps: mapped via Steps
 start = TIMESTAMP
 stop = TIMESTAMP (or None)
 #ETA: not archived
 slave_id
 text: list of STRINGs?
 color: INT (or enum? or string?)
 results: INT (enum: SUCCESS, WARNINGS, FAILURE)
 #logs: mapped via BuildLogs
 #test results???

BuildTimesChanges:
 (there are lots of these, up to len(builds)*len(changes)
 build_id
 change_id

IBuilderStatus:
 builder_id = UNIQUE
 name: STRING
 ...

ResponsibleUsers:
 (there are lots of these)
 build_id
 user_id

InterestedUsers:
 (there are lots of these)
 build_id
 user_id

Steps:
 build_id
 number = INT (within the Build)
 start = TIMESTAMP (or None)
 stop = TIMESTAMP (or None)
 #ETA: not archived
 #expectations??
 #logs: mapped via StepLogs
 finished = BOOL
 text: list of STRINGs
 color: ??
 results: INT (enum)

Slaves:
 slave_id = UNIQUE
 slavename

BuildLogs:
 build_id
 log_id
 name: STRING ?

StepLogs:
 step_id = UNIQUE
 name: STRING
 log_id

Logs:
 log_id = UNIQUE
 step_id ?
 isFinished = BOOL
 filename

TestResult:
 build_id
 name = STRING (index)
 results = INT (enum)
 logs: ??

BuilderEvents:
 builder_id
 event_number = INT
 start = TIMESTAMP
 end = TIMESTAMP (or None)
 text = list of strings?
 color: ??