Windows slaves freezing on downloadFile step

Elliot Saba staticfloat at gmail.com
Tue Oct 6 01:26:16 UTC 2015


The slave works fine for many other tasks (including file uploads), it's
only the `downloadFile` step that it freezes on.  I have other
(non-windows) slaves on the same network doing similar things (file
download and upload) that aren't running into these problems.

Is there a place I should instrument my slave code with print's or
something to try and figure out where it's crashing?
-E

On Mon, Oct 5, 2015 at 6:09 PM, Dustin J. Mitchell <dustin at v.igoro.us>
wrote:

> It sounds like it's not so much freezing, as crashing.  Or something's
> going wrong with the network between the hosts.
>
> Dustin
>
> On Mon, Oct 5, 2015 at 8:05 PM, Elliot Saba <staticfloat at gmail.com> wrote:
>
>> All I see is the following:
>>
>> 2015-10-06 00:02:26+0000 [Broker,28,www.xxx.yyy.zzz] Unhandled Error
>>         Traceback (most recent call last):
>>         Failure: twisted.spread.pb.PBConnectionLost: [Failure instance:
>> Traceback (failure with no frames): <class
>> 'twisted.internet.error.ConnectionLost'>: Connection to the other side was
>> lost in a non-clean fashion.
>>         ]
>>
>> On Sun, Oct 4, 2015 at 6:54 AM, Dustin J. Mitchell <dustin at v.igoro.us>
>> wrote:
>>
>>> Are you seeing any tracebacks in the master's twistd.log?
>>>
>>> Dustin
>>>
>>> On Wed, Sep 30, 2015 at 1:35 PM, Elliot Saba <staticfloat at gmail.com>
>>> wrote:
>>>
>>>> Hello all, I have some Windows Server buildslaves that are freezing on
>>>> a master -> slave file transfer step.  The file itself is only a few KB, so
>>>> I know it's not a network speed issue, and other steps (including slave ->
>>>> master uploads) have been successful.
>>>>
>>>> You can see the buildmaster webpage here
>>>> <http://buildbot.e.ip.saba.us:8010/waterfall?tag=Juno>, but that
>>>> doesn't show too much information, except that (1) the slaves continually
>>>> timeout and restart themselves, and (2) the webpage "interrupt" button
>>>> doesn't seem to work, so the buildslaves just continually restart
>>>> themselves with no apparent way to force them to stop.  twistd.log on one
>>>> of the slaves shows this log
>>>> <https://gist.github.com/staticfloat/bed35422a12772439c2c>, which I
>>>> think shows a timeout on the slave side, thinking that the buildmaster is
>>>> frozen.  My question is:
>>>>
>>>> (1) What's the proper way to clear the queue when the "interrupt"
>>>> button doesn't work?  So far, I'm getting around this by stopping the build
>>>> master, then starting it back up, and canceling the jobs from the "pending"
>>>> queue before the buildslaves can reconnect.
>>>>
>>>> (2) Why would this download step be freezing?  I've tried turning off
>>>> the windows firewall and that doesn't seem to be the problem.
>>>>
>>>> If there's more information I need to give, please let me know!
>>>> -E
>>>>
>>>> _______________________________________________
>>>> users mailing list
>>>> users at buildbot.net
>>>> https://lists.buildbot.net/mailman/listinfo/users
>>>>
>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.buildbot.net/pipermail/users/attachments/20151005/7bcf11b6/attachment.html>


More information about the users mailing list