|
|
|
|
||||||
| comp.protocols.tcp-ip TCP and IP network protocols. |
![]() |
|
|
LinkBack | Outils de la discussion |
|
|
#1 |
|
Messages: n/a
Hébergeur: |
We are having FTP performance problems when taking a certain IP route
through our network. A packet capture shows an Active FTP session riddled with tons of TCP Duplicate Acks by the receiver and subsequent Retransmissions by the sender (CCC).... no wonder the user experience was very poor. However, although we have been able to identify the symptoms, the root cause has eluded us. Our analysis shows that the sender appears to be properly transmitting his FTP Data packets, but the receiver is sending a normal ACK followed immediately by a DUP ACK. This ongoing "paranoid" behavior by the receiver results in Retransmissions by the sender, thus clogging up the flow with a snowballing effect. The strange thing here is that according to the timestamps, there is no need for the receiver to be sending a DUP ACK because there appears to be no delay whatsoever. So what else would cause the receiver to feel the need to send a DUP ACK if there was no obvious delay ? |
|
|
|
#2 |
|
Messages: n/a
Hébergeur: |
tmcgov05 wrote:
> A packet capture shows an Active FTP session riddled with tons of TCP > Duplicate Acks by the receiver and subsequent Retransmissions by the > sender (CCC).... no wonder the user experience was very poor. However, > although we have been able to identify the symptoms, the root cause has > eluded us It sounds like the root cause is packet loss. Where is your sniffer located? > Our analysis shows that the sender appears to be properly > transmitting his FTP Data packets, but the receiver is sending a normal > ACK followed immediately by a DUP ACK. A dupACK is a normal ACK. It's just *another* normal ACK. The fact that you're seeing the data sent correctly indicates that nothing is wrong with the sender, nor with the network equipment up to the point of the sniffer, but that packets (segments) aren't correctly arriving at the receiver. Either the network isn't delivering them, or the receiver can't hear them. > This ongoing "paranoid" behavior > by the receiver results in Retransmissions by the sender, thus clogging > up the flow with a snowballing effect. The receiver isn't paranoid. He's telling you that data isn't showing up in order. Congestion control on the sender should prevent a snowball effect. The overall packet rate should be very low when dupACKs are rolling in. > The strange thing here is that according to the timestamps, there is no > need for the receiver to be sending a DUP ACK because there appears to > be no delay whatsoever. So what else would cause the receiver to feel > the need to send a DUP ACK if there was no obvious delay ? dupACKs are not a product of delay. They're a product of missing/misordered segments. Try deploying two sniffers: one very close to the server, the other very close to the client. See if all client-bound packets observed by the server sniffer are observed by the client sniffer. /chris |
|
|
|
#3 |
|
Messages: n/a
Hébergeur: |
Thanks Chris,
The Sniffer is monitoring the router interface closest to the FTP server (sender). I totally agree about getting a capture at the other end but unfortunately the receiver is another company and they are not cooperating very well. The strange thing is this: the problem goes away and I get a perfectly ordered packet capture when changing a static route on the aformentioned router from the HSRP virtual address to the physical address of WAN router 2. Note that WAN router 1&2 are connected with a /30 ethernet segment running OSPF and all traffic is designed to get routed over WAN router2. So when the virtual next-hop is used, there is that extra hop... which is obviously causing the problem... just can't figure out why. It doesn't that we dont have access to the WAN routers (provider managed). .3 (Primary)--------------WAN router 1 ----------------> My Router------.5 (virtual HSRP) .4 (Secondary)--------------WAN router 2 -----------------> |
|
|
|
#4 |
|
Messages: n/a
Hébergeur: |
tmcgov05 wrote:
> Thanks Chris, > > The Sniffer is monitoring the router interface closest to the FTP > server (sender). I totally agree about getting a capture at the other > end but unfortunately the receiver is another company and they are not > cooperating very well. The strange thing is this: the problem goes > away and I get a perfectly ordered packet capture when changing a > static route on the aformentioned router from the HSRP virtual address > to the physical address of WAN router 2. Note that WAN router 1&2 are > connected with a /30 ethernet segment running OSPF and all traffic is > designed to get routed over WAN router2. So when the virtual next-hop > is used, there is that extra hop... which is obviously causing the > problem... just can't figure out why. It doesn't that we dont have > access to the WAN routers (provider managed). > > .3 (Primary)--------------WAN router 1 > ----------------> > My Router------.5 (virtual HSRP) > .4 (Secondary)--------------WAN router 2 > -----------------> Interesting. I presume you mean that the routers are connected with an *additional* ethernet segment to the one where the HSRP router lives? The setup you've described doesn't fit in a /30. Sounds like there is a crossover cable between the WAN routers. My guess: You have a duplex mismatch on the crossover interfaces of the WAN routers. What would I do? I'd get an ethernet tap on the crossover, and listen to the CDP messages being advertised by each router. Those messages should tell you the operating modes of each router's transceiver. You can build a tap for just a few dollars: http://www.snort.org/docs/tap/ /chris |
|
|
|
#5 |
|
Messages: n/a
Hébergeur: |
If part of the "bad" path is a trunked/bonded/aggregated link and doing something foolish like round-robin packet scheduling across the links the segments could be getting reordered, which would cause the reciever to send an immediate ACK upon reciept of each out-of-order segment. That ACK would be for the last in-order segment received and would therefore look like a duplicate. If you can get cooperation from the receiver, have them take a snapshot of TCP statistics on the FTP server from two points during your FTP transfer and run them (if possible) through the likes of beforeafter: ftp://ftp.cup.hp.com/dist/networking/tools/ and see if the receiver is reporting lots of out-of-order segments. Also see if they are receiving lots (for some definition of "lots" which would be as a percentage of total traffic) of completely duplicate packets. Lots of out-of-order and dups suggest either a fubar sender, or packet reordering in the network triggering false rtx. Lots of out-of-order without lots of dups suggest honest-to-goodness packet loss in the network. It might be a real pain, but one other way to see if it is reordering rather than packet loss is to "cripple" the "send window" on your FTP client - say by shrinking his send socket buffer to something no more than about 3 or 4x the TCP MSS between your client and the server. The idea is to keep there from being enough TCP segments outstanding on the connection to trigger fast retransmit. Run that on both the "good" and the "bad" config and be prepared for it to be rather slower. If your dup ACKs and retrans go away it means either the link was saturated, or it was doing that round-robin thing. Not conclusive, but it would be another data point that didn't require cooperation from either the network provider or the FTP server admin. rick jones -- oxymoron n, Hummer H2 with California Save Our Coasts and Oceans plates these opinions are mine, all mine; HP might not want them anyway... ![]() feel free to post, OR email to rick.jones2 in hp.com but NOT BOTH... |
|
|
|
#6 |
|
Messages: n/a
Hébergeur: |
right - I am referring to an additional segment between the routers.
However, I have eliminated that as a cause because additional captures verify that destinations that use the Primary HSRP router's WAN link - WAN router 1 in the diagram - (meaning they never cross that segment) also suffer from the exact same problem of tons of Dup ACKS and Retransmissions. I really suspect that the provider's HSRP implementation is bad (even though comfigs look good). I am going to further test this by changing the static route of one of the aformentioned destinations to be the physical IP address of the Primary HSRP router instead of the virtual, do another packet capture, and see if TCP stabilizes. Thanks for all of your insight. |
|
|
|
#7 |
|
Messages: n/a
Hébergeur: |
Thanks Rick - this is interesting stuff. I just may have to give that a
try. |
|
|
|
#8 |
|
Messages: n/a
Hébergeur: |
Rick Jones wrote:
> If part of the "bad" path is a trunked/bonded/aggregated link and > doing something foolish like round-robin packet scheduling across the > links the segments could be getting reordered, which would cause the > reciever to send an immediate ACK upon reciept of each out-of-order > segment. That ACK would be for the last in-order segment received and > would therefore look like a duplicate. Very insightful, Rick. The OP mentioned seeing TCP retransmissions, but it's possible that the connection is experiencing minor packet loss, with consistent reordering. That is to say, maybe packets are showing up like this: 2,1,4,3,6,5,8,7 That would result in a lot of dupACKs, but not many retransmissions, which I think is the scenario you're describing. > If you can get cooperation from the receiver, have them take a > snapshot of TCP statistics on the FTP server from two points during > your FTP transfer and run them (if possible) through the likes of > beforeafter: > > ftp://ftp.cup.hp.com/dist/networking/tools/ > > and see if the receiver is reporting lots of out-of-order segments. > Also see if they are receiving lots (for some definition of "lots" > which would be as a percentage of total traffic) of completely > duplicate packets. The OP might be able to learn something by looking only at the sender's TCP stats: If the number of TCPinDupAcks (or somesuch) is increasing rapidly, but the number of TCPRetransmissions (or somesuch) is not, then it sounds like a misordering problem. > Lots of out-of-order and dups suggest either a fubar sender, or packet > reordering in the network triggering false rtx. > > Lots of out-of-order without lots of dups suggest honest-to-goodness > packet loss in the network. I think Rick means dup *segments* in those statements, not dupACKs. I've thought of another scenario: How sure are you that the traffic is only going across one of your WAN links? Perhaps your HSRP primary is sending some of the traffic to the WAN, and some of the traffic to the other WAN router? Can you pull the WAN cable from the WAN router 2 (.3)? /chris |
|
![]() |
| Outils de la discussion | |
|
|