[Buildbot-devel] Buildslave disconnects

Mark MacVicar mark.macvicar at gmail.com
Fri Dec 12 19:47:31 UTC 2008


If anyone cares. The problem I was having with buildslave disconnects
wasn't caused by disk space. I happen to be running a mysql server on
the same server as the buildmaster and buildslave and for whatever
reasons (i.e., my bad configs) mysql and the buildslave cause the
server to overload. Moving the buildslave to another system helped
alot (i.e., the server doesn't crash daily anymore).

I still need to figure why mysql is so slow, but that is another story.

This server is a virtual machine, so don't panic. I'm clearly pushing
it beyond it's limits.

Mark MacVicar

On Wed, Nov 5, 2008 at 1:56 PM, Mark MacVicar <mark.macvicar at gmail.com> wrote:
> I think I may have tracked down the problem. Disk space!
>
> Our root linux partition was filling up. After moving some files off
> and cleaning up /tmp, buildbot seems more stable.
>
> Thanks for everyone's advice.
>
> On Mon, Nov 3, 2008 at 2:22 PM, Mark MacVicar <mark.macvicar at gmail.com> wrote:
>> Our buildmaster and one slave is running on an instance of VMWare, the
>> other slave is a dedicated pc.
>>
>> I'll see if I can track it down to the TCP issue you mentioned or try
>> to get some better hardware.
>>
>> I tried increasing the timeout for each slave to 1200 seconds but they
>> are still reporting "BotFactory.checkActivity: nothing from master for
>> 1307 secs" then they reconnect almost immediately.
>>
>> I think there might be an update to Twisted. I'll try updating that too.
>>
>> Thanks for you insight,
>>
>> Mark MacVicar
>>
>> On Thu, Oct 30, 2008 at 10:26 AM, Bill Baker <bbb at uiuc.edu> wrote:
>>> I've had periodic disconnects, but I didn't notice that log message.  It
>>> seemed hardware-dependent, actually.  My build slaves are all virtual
>>> machines running in VMware Server 1 and 2, and on one physical server I got
>>> disconnections every hour or so, causing long-running builds to fail.  I
>>> tried moving them to newer hardware, and all the disconnections stopped.
>>>
>>> So my first guess is that the cause is different.  With mine, I think it was
>>> due to an interaction between the ethernet hardware and VMware, which is
>>> constantly juggling MAC addresses, and seemed to be breaking TCP connections
>>> on one platform but not on the other.
>>>
>>> On Wed, Oct 29, 2008 at 4:24 PM, Mark MacVicar <mark.macvicar at gmail.com>
>>> wrote:
>>>>
>>>> Hi,
>>>>
>>>> I'm having a problem where my buildslaves are disconnecting every few
>>>> days (sometimes multiple times per day) with the following log:
>>>>
>>>> buildslave_linux32/twistd.log:1321:2008/10/28 09:47 -0700 [-]
>>>> BotFactory.checkActivity: nothing from master for 629 secs
>>>>
>>>> This is even happening for the slave I have running on the same server
>>>> as the master.
>>>>
>>>> I'm running my buildmaster and one of my slaves on OpenSuse 10.3. I'm
>>>> currently running a modfied buildbot 0.7.8.
>>>>
>>>> I wasn't having this problem for a while, so I'm suspicious of my
>>>> modifications, but I haven't modified the slave scripts.
>>>>
>>>> Has anyone encountered periodic, seemingly random buildslave
>>>> disconnects and found a solution?
>>>>
>>>> Thank you,
>>>>
>>>> Mark MacVicar
>>>>
>>>> -------------------------------------------------------------------------
>>>> This SF.Net email is sponsored by the Moblin Your Move Developer's
>>>> challenge
>>>> Build the coolest Linux based applications with Moblin SDK & win great
>>>> prizes
>>>> Grand prize is a trip for two to an Open Source event anywhere in the
>>>> world
>>>> http://moblin-contest.org/redirect.php?banner_id=100&url=/
>>>> _______________________________________________
>>>> Buildbot-devel mailing list
>>>> Buildbot-devel at lists.sourceforge.net
>>>> https://lists.sourceforge.net/lists/listinfo/buildbot-devel
>>>
>>>
>>
>




More information about the devel mailing list