[Buildbot-devel] GSoC: Initial thoughts on the Graphs and Data Charts Project

Mon Mar 9 14:40:57 UTC 2015

Hi

I'm student developer and I would be applying for a student project under
Buildbot as a potential GSoC student. I have done a fair bit of work with
Python and have also contributed to a couple of FOSS projects and so, I
think I'll be able to work well with the Buildbot community.

Anyway, after setting up a development environment, I went through the
Buildbot documentation (along with skimming the relevant code). Once I was
comfortable with the architecture, I took a look at the projects and I felt
that project for "Adding support for graphing data charts of build
statistics over time <http://trac.buildbot.net/ticket/2461>" is something
I'd really like to do.

So, I read the bug report and after thinking it over for a while, I have a
few ideas that I's like to share. I'll quickly mention what the project
wants us to achieve for the benefit of others and then I'll write out my
ideas.

*The project: *The task is to provide context based graphs of the build
statistics. The actual statistic itself can be either something standard
(like build times for a builder over time) or something user-defined (such
as the generated binary size).

Okay so, here's what I'm thinking:

Firstly, we can divide all build statistics into the following two
categories:

a. time-taken to accomplish certain task (e.g. time taken for a build)

b. any other generalized measurement that is not a time measurement (e.g.
the size of a build)

My reason for this kind of division will become clear in a bit. First, let
me elaborate further on these types:

*a*.* Measurements of time-taken to finish a task*:

This can be anything - the time for running the test suite, the time taken
during compilations or anything else that the user can think of as long as
it is a time measurement. For this type of statistics, the user can just
add a *Step* to the build. Then, buildbot can measure the time taken for
this step (for each build performed under any builder). In fact. this
measurement is already done by buildbot and can be accessed using either
the data API or JSON API. Displaying the data will be rather elementary
after this by simply running a bit of javascript to parse the data and pass
it to a graphing library such as d3.js or chart.js.

*b*.* All other types of measurements:*

This is the more difficult of the two types since we do not have a standard
way of collecting the actual data. It is made further difficult by the fact
that we would like the user to be able to measure any arbitrary statistic
that any build can produce. A good way of doing this (while allowing for
maximum flexibility) is using steps. We could add a new *Step* type which
would run a user defined script and then read the result of that script
from stdout. This would allow the user to do absolutely anything he/she
desires within the script. If the script runs successfully, it would return
an exit code of 0 and print it's numerical result to stdout. If it fails,
it will return a non zero exit code and we can show that the step failed in
the build. As for the graph, such a failed step will be a missing
data-point in the graph (shown in red to indicate failure). Further, we
will need to add such additional statistics (per builder) in the
configuration file. Using this data, we'll be able to produce graph for
this statistic. As far as storage is concerned, we can store it in the
existing database, creating a new table for each builder. The statistics
can be stored as a JSON string, mapping a build identifier (for that
builder, of course) to a build statistic. For displaying this in a graph,
we'll need to add methods to the python server that can collect this data
from the db and expose it to a JSON API so that the browser can graph it.

It should be clear now why I chose to divide the build stats into the above
mentioned categories - it simplifies a lot of work since Buildbot can
already gather and report time based data. Anyway, I believe that my
proposal above covers all possible types of statistics a user might want to
measure. Also, this proposal will need minimal changes to the exiting
codebase for its implementation. It will also be very little work on the
user's part.

So, that's the basic overview of my idea. Please let me know your
opinions/comments on this. Also, any questions are very welcome.

@Dustin: Tagging you since you you modified the bug last.

PS: In the meantime, I've tried to get my feet wet with the codebase by
submitting a small PR: https://github.com/buildbot/buildbot/pull/1578. I
will try to make a few more patches before the final deadline. Or, if the
mentors want, I could even work on a small POC code for this project.

Prasoon Shukla
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://buildbot.net/pipermail/devel/attachments/20150309/022cfde8/attachment.html>