[Buildbot-devel] GSoC Asynchronous Master/Slave Protocol

Thu Apr 5 22:23:50 UTC 2012

I apologize for not getting back to you sooner, this week has been very busy for me. I'm back from my vacation, but I've had to catch up at school.

I've given some more thought to the design of the project.

Each master or slave has a queue manager, that is responsible for the message queue system. 
The queue manager consists of three pieces:

1.Channel Manager
2.Message Queues
3.Message Protocol

Message Protocol:
Based on the feedback from my original project proposal, I think the best choice is to use AMP.

Benefits:
It's integrated with Twisted
Data is serialized to binary form
Easy to map messages with function calls

A Buildbot AMP protocol eventually needs to be made that maps all remote messages to Buildbot functions. This is mostly 'manual' work, and we only need a small subset of this at first, in order to test the queue manager.

Channel Manager

... is responsible for:
1. handling and abstracting the connection details.
2. receiving messages and putting them into the receive queue
3. taking messages out of the send queue and sending them out
*messages would be the collection of AMP key/value pairs
*communication between the queues and the channel manager will be using callbacks, and the changing states of messages will be the events that raise them

I plan on using Twisted for networking.

I was thinking about using UDP, and I think Twisted might have some features that help support it. But it will probably be a lot easier to use TCP at first. If we use TCP, then we need to worry about the issue of connections. The channel manager should shutdown connections after a period of no-message activity, maybe 5 or 10 minutes.

Items that need to be addressed:
What should the channel manager do when it can't establish a connection with a machine?
What should the channel manager do when it can't pass communicate with the local message queues? i.e. full receive queue.

Message Queues
Global send and receive queues, that temporarily holds AMP messages.

Items that need to be addressed:
How big should the queues be?
Is there a maximum message size?
Should there be different message priorities?

I'm not sure if there would be a benefit from having queues for each connection. i.e. taking messages from the global queues, and dispatching them to another queue that only holds messages for that sender/receiver.

Looking forward to hearing your thoughts.

Tibi

----- Original Message -----
From: "Dustin J. Mitchell" <dustin at v.igoro.us>
To: "Tiberiu Paunescu" <tpa12 at sfu.ca> 
Cc: buildbot-devel at lists.sourceforge.net
Sent: Monday, 26 March, 2012 10:27:09 PM
Subject: Re: [Buildbot-devel] GSoC Asynchronous Master/Slave Protocol

On Mon, Mar 26, 2012 at 6:11 PM, Tiberiu Paunescu <tpa12 at sfu.ca> wrote:
> I'm interested creating an asynchronus master - slave protocol. I have a rough idea for a schedule, but I think this is something that needs more discussion. I would really appreciate input on this.
>
> 1. Familiarization ~2-3 Weeks
>
> Initially, I need to become familiar with the Buildbot architecture, with an emphasis on master-slave communication. I think some useful deliverables might be sequence diagrams of the communications and other UML diagrams, and improved documentation on the website. This would help me and future developers understand how things work.

There is some detail on this here:
  http://buildbot.net/buildbot/docs/current/developer/master-slave.html
I do like hearing plans for deliverables from this stage - it's
frustrating for a mentor to only hear "I did some reading..", since
it's hard to quantify.

Also, if you're not familiar with Twisted Python, then you may want to review
  http://buildbot.net/buildbot/docs/current/developer/style.html
which talks a lot about how we use Twisted.  It would be great to have
plans for concrete evidence of proficiency with Twisted.  For this
particular project, that may mean implementing some simple network
protocol as a Twisted daemon.

> 2. Design ~4-6 Weeks
>
> I imagine that during this phase, I would work very closely with my mentor. When designing the protocol, it's important to consider the needs of Buildbot, and any work already done on this. I like zeromq, and I have a lot of experience programming in C. Deliverables would be the design documents created.
>
> 3. Implementation ~4-6 Weeks
>
> Once my mentor and I have agreed upon a design, and I've documented my intention for the protocol, I can begin implementation. I think creating a design that satisfies the needs of Buildbot and is well documented is more important than actually implementing it. If I don't complete the implementation, I can always work on it after the GSoC based on the design.
>
> 4. Testing ~2 Weeks
>
> Any remaining time would be spent on testing.

These three steps are probably best done simultaneously, actually.  We
generally like to see new functionality implemented as a sequence of
small, easily reviewed patches, each complete with tests and
documentation.  If necessary, that sequence of patches can be kept on
a non-master branch, and only merged when it's "ready", but this
avoids the problem of you working for weeks to churn out thousands of
lines of code, then sitting on your hands while one of the developers
tries to read through it -- with the right granularity of patches,
review and work on the next part can proceed in parallel.  Hitting
this balance is one of the harder parts of doing signficant work in
OSS, as you probably know from last year.

> I'm currently on vacation, and I don't have access to a computer that I can configure for building. However, I would like to set up a build machine for Worldforge's Ember client. Looking at the documentation for Buildbot it seems that build scripts can only be configured in python. Ideally, I think that the build process should allow for configuration in any language.
>
> I'm really interested in distributed systems, and I'd like to work with Buildbot this summer and in the future.

That sounds great -- a reason to continue using the application
indicates to us that you're likely to keep *contributing* to it, as
well, which is one of the program goals.

Dustin