[Buildbot] #3183: fix network "spottiness" on new infra

Buildbot trac trac at buildbot.net
Sun Feb 8 21:33:08 UTC 2015


#3183: fix network "spottiness" on new infra
---------------------+--------------------------
Reporter:  dustin    |       Owner:
    Type:  defect    |      Status:  new
Priority:  critical  |   Milestone:  sys - other
 Version:            |  Resolution:
Keywords:            |
---------------------+--------------------------

Comment (by dustin):

 On that post, I once again got "Warning: the change has been saved ..",
 and the connection hung for quite a while.  I put the tcpdumps from both
 trac and my laptop in
 https://gist.github.com/djmitche/4dad36e0fdbfcc5cc215, with some editing
 to make them more useful.  Overall, the connection shows a 78-second TCP
 connection.  The "trac" host is using TSO, which means that tcpdump on
 that host captures the oversized segments that the kernel hands to the
 NIC, while tcpdump on the laptop captures the smaller segments into which
 the NIC breaks them.  This all seems to work OK, although it makes it
 difficult to match up packets.

 Things start off well enough:

 {{{
 4:03 IP laptop.60061 > trac.http: Flags [S], seq 833594151, win 29200,
 options [mss 1460,sackOK,TS val 17271196 ecr 0,nop,wscale 7], length 0
 4:03 IP trac.http > laptop.60061: Flags [S.], seq 1018352583, ack
 833594152, win 65535, options [mss 1460,nop,wscale 6,sackOK,TS val
 4013412119 ecr 17271196], length 0
 4:03 IP laptop.60061 > trac.http: Flags [.], ack 1, win 229, options
 [nop,nop,TS val 17271280 ecr 4013412119], length 0
 4:03 IP laptop.60061 > trac.http: Flags [.], seq 1:1449, ack 1, win 229,
 options [nop,nop,TS val 17271280 ecr 4013412119], length 1448
 4:03 IP laptop.60061 > trac.http: Flags [.], seq 1449:2897, ack 1, win
 229, options [nop,nop,TS val 17271280 ecr 4013412119], length 1448
 4:03 IP laptop.60061 > trac.http: Flags [P.], seq 2897:3681, ack 1, win
 229, options [nop,nop,TS val 17271282 ecr 4013412119], length 784
 4:04 IP trac.http > laptop.60061: Flags [.], ack 2897, win 1018, options
 [nop,nop,TS val 4013412205 ecr 17271280], length 0
 4:04 IP trac.http > laptop.60061: Flags [.], ack 3681, win 1040, options
 [nop,nop,TS val 4013412312 ecr 17271282], length 0
 }}}
 laptop sends a 3681-byte HTTP request, and track acks receipt.

 10 seconds later, and every 10 seconds thereafter, laptop pings and track
 pongs:
 {{{
 4:14 IP laptop.60061 > trac.http: Flags [.], ack 1, win 229, options
 [nop,nop,TS val 17281473 ecr 4013412312], length 0
 4:14 IP trac.http > laptop.60061: Flags [.], ack 3681, win 1040, options
 [nop,nop,TS val 4013422396 ecr 17271282], length 0
 4:24 IP laptop.60061 > trac.http: Flags [.], ack 1, win 229, options
 [nop,nop,TS val 17291584 ecr 4013422396], length 0
 4:24 IP trac.http > laptop.60061: Flags [.], ack 3681, win 1040, options
 [nop,nop,TS val 4013432507 ecr 17271282], length 0
 4:34 IP laptop.60061 > trac.http: Flags [.], ack 1, win 229, options
 [nop,nop,TS val 17301696 ecr 4013432507], length 0
 4:34 IP trac.http > laptop.60061: Flags [.], ack 3681, win 1040, options
 [nop,nop,TS val 4013442619 ecr 17271282], length 0
 4:44 IP laptop.60061 > trac.http: Flags [.], ack 1, win 229, options
 [nop,nop,TS val 17311808 ecr 4013442619], length 0
 4:44 IP trac.http > laptop.60061: Flags [.], ack 3681, win 1040, options
 [nop,nop,TS val 4013452730 ecr 17271282], length 0
 4:54 IP laptop.60061 > trac.http: Flags [.], ack 1, win 229, options
 [nop,nop,TS val 17321920 ecr 4013452730], length 0
 4:54 IP trac.http > laptop.60061: Flags [.], ack 3681, win 1040, options
 [nop,nop,TS val 4013462842 ecr 17271282], length 0
 5:04 IP laptop.60061 > trac.http: Flags [.], ack 1, win 229, options
 [nop,nop,TS val 17332032 ecr 4013462842], length 0
 5:04 IP trac.http > laptop.60061: Flags [.], ack 3681, win 1040, options
 [nop,nop,TS val 4013472954 ecr 17271282], length 0
 }}}
 I don't recognize this pattern, but it looks like the hosts do, so OK.

 A full 76 seconds after the connection was initiated, trac begins
 replying:
 {{{
 5:19 IP trac.http > laptop.60061: Flags [P.], seq 1:399, ack 3681, win
 1040, options [nop,nop,TS val 4013487830 ecr 17271282], length 398
 }}}
 this goes on for a while, seemingly without any issue.

 Finally, after 47419 bytes of response, the laptop sends a FIN:
 {{{
 5:21 IP trac.http > laptop.60061: Flags [.], seq 45971:47419, ack 4762,
 win 1040, options [nop,nop,TS val 4013490058 ecr 17349135], length 1448
 5:21 IP laptop.60061 > trac.http: Flags [.], ack 47419, win 984, options
 [nop,nop,TS val 17349220 ecr 4013490058], length 0
 5:21 IP laptop.60061 > trac.http: Flags [F.], seq 4762, ack 47419, win
 984, options [nop,nop,TS val 17349220 ecr 4013490058], length 0
 }}}

 But trac isn't done yet, and keeps sending data, to which the laptop sends
 RST packets:
 {{{
 5:21 IP trac.http > laptop.60061: Flags [.], seq 47419:48867, ack 4762,
 win 1040, options [nop,nop,TS val 4013490059 ecr 17349135], length 1448
 5:21 IP laptop.60061 > trac.http: Flags [R], seq 833598913, win 0, length
 0
 5:21 IP trac.http > laptop.60061: Flags [.], seq 48867:50315, ack 4762,
 win 1040, options [nop,nop,TS val 4013490060 ecr 17349136], length 1448
 5:21 IP laptop.60061 > trac.http: Flags [R], seq 833598913, win 0, length
 0
 5:21 IP trac.http > laptop.60061: Flags [.], seq 50315:50351, ack 4762,
 win 1040, options [nop,nop,TS val 4013490060 ecr 17349136], length 36
 5:21 IP laptop.60061 > trac.http: Flags [R], seq 833598913, win 0, length
 0
 5:21 IP trac.http > laptop.60061: Flags [.], seq 50351:51799, ack 4762,
 win 1040, options [nop,nop,TS val 4013490143 ecr 17349219], length 1448
 5:21 IP laptop.60061 > trac.http: Flags [R], seq 833598913, win 0, length
 0
 5:21 IP trac.http > laptop.60061: Flags [.], seq 51799:53247, ack 4762,
 win 1040, options [nop,nop,TS val 4013490144 ecr 17349220], length 1448
 5:21 IP laptop.60061 > trac.http: Flags [R], seq 833598913, win 0, length
 0
 5:21 IP trac.http > laptop.60061: Flags [.], seq 53247:54695, ack 4762,
 win 1040, options [nop,nop,TS val 4013490144 ecr 17349220], length 1448
 5:21 IP laptop.60061 > trac.http: Flags [R], seq 833598913, win 0, length
 0
 5:21 IP trac.http > laptop.60061: Flags [.], seq 54695:56143, ack 4763,
 win 1040, options [nop,nop,TS val 4013490144 ecr 17349220], length 1448
 5:21 IP laptop.60061 > trac.http: Flags [R], seq 833598914, win 0, length
 0
 }}}

 So, two funny things here.  First, it took well over a minute for Trac to
 respond -- which is probably associated with the failed attempt to send an
 email.  That's probably the bug we're after here.

 Second, trac's sending more data than the browser expects.  This might be
 due to a missing or incorrect Content-Length header.  I'm not going to
 worry about that one right now.

 On a whim, I disabled TSO on service2's em0 and em1 interfaces.  Let's see
 if that helps!

--
Ticket URL: <http://trac.buildbot.net/ticket/3183#comment:2>
Buildbot <http://buildbot.net/>
Buildbot: build/test automation


More information about the bugs mailing list