|
|
|
|
||||||
| comp.protocols.tcp-ip TCP and IP network protocols. |
![]() |
|
|
LinkBack | Outils de la discussion |
|
|
#1 |
|
Messages: n/a
Hébergeur: |
A dump from the inetstatShow command lists multiple connections in the
FIN_WAIT_1 state, some of which have data in their Send-Q. These connections were previously ESTABLISHED connections between a CORBA middle-ware client and server. They remian in the FIN_WAIT_1 state for hours. The dump also shows some ESTABLISHED connections with data in the Recv-Q. The amount of data in the Recv-Q does not change over a period of hours, again, connections related to CORBA middle-ware. Finally, there is a large amount of data in the Recv-Q of a UDP echo server. The amount of data fluctuates over time, but the Queue is never emptied. All these partially terminated connections and non-empty data queues exhaust the avialable mbuf pool. I am unable to explain this bizarre behavior. Has anyone seen something similar? Thanks in advance. |
|
|
|
#2 |
|
Messages: n/a
Hébergeur: |
fell.anthony@gmail.com wrote:
> A dump from the inetstatShow command lists multiple connections in > the FIN_WAIT_1 state, some of which have data in their Send-Q. > These connections were previously ESTABLISHED connections between a > CORBA middle-ware client and server. They remian in the FIN_WAIT_1 > state for hours. Interesting. One would indeed expect those connections to "retransmit timeout" - are you sure it is the same data in the Send-Q you see each time, and is indeed the same connection in FIN_WAIT_1? You might try to get a packet trace of traffic to/from those connections to see if there are indeed ACKs coming back that may be keeping the connection alive. Otherwise, it may be a bug in the TCP stack. > The dump also shows some ESTABLISHED connections with data in the > Recv-Q. The amount of data in the Recv-Q does not change over a > period of hours, again, connections related to CORBA middle-ware. That one sounds like a middleware/app issue - either not checking for data available, or ignoring data available indications. > Finally, there is a large amount of data in the Recv-Q of a UDP echo > server. The amount of data fluctuates over time, but the Queue is > never emptied. That one sounds a triffle sinister. Unless you know you have applications that really need to reach-out and touch the echo service you should probably disable it, or at least restrict it to your organization's IPs. Particularly a UDP echo service since that one would be more vulnerable to IP source address spoofing, allowing an attacker to get you to send data to some other poor slob out on the network. > All these partially terminated connections and non-empty data queues > exhaust the avialable mbuf pool. Just how many of these things do you have and how small is your mbuf pool? rick jones -- a wide gulf separates "what if" from "if only" these opinions are mine, all mine; HP might not want them anyway... ![]() feel free to post, OR email to rick.jones2 in hp.com but NOT BOTH... |
|
|
|
#3 |
|
Messages: n/a
Hébergeur: |
Rick Jones wrote:
> fell.anthony@gmail.com wrote: > > A dump from the inetstatShow command lists multiple connections in > > the FIN_WAIT_1 state, some of which have data in their Send-Q. > > These connections were previously ESTABLISHED connections between a > > CORBA middle-ware client and server. They remian in the FIN_WAIT_1 > > state for hours. > > Interesting. One would indeed expect those connections to "retransmit > timeout" - are you sure it is the same data in the Send-Q you see each > time, and is indeed the same connection in FIN_WAIT_1? You might try > to get a packet trace of traffic to/from those connections to see if > there are indeed ACKs coming back that may be keeping the connection > alive. Otherwise, it may be a bug in the TCP stack. > First, the OS is VxWorks and its network stack is based on BSD4.3. I don't know if the data is the same, but the size of the data in the Send-Q is the same. Yes, its the same connection that is stuck in the FIN_WAIT_1 state. See partial output from inetstatShow command at times 17:06, 17:50 and 20:43, respectively. fe2e73c TCP 0 0 172.23.92.232.1106 172.20.37.21.43091 FIN_WAIT_1 fe2e6b8 TCP 0 4096 172.23.92.232.1105 172.20.60.44.33456 FIN_WAIT_1 fe2e3a0 TCP 0 0 172.23.92.232.1103 172.20.60.44.33456 FIN_WAIT_1 fe2e4a8 TCP 0 0 172.23.92.232.1099 172.20.60.44.33456 FIN_WAIT_1 fe2e190 TCP 0 2048 172.23.92.232.1097 172.20.37.21.43091 FIN_WAIT_1 fe2e73c TCP 0 0 172.23.92.232.1106 172.20.37.21.43091 FIN_WAIT_1 fe2e6b8 TCP 0 4096 172.23.92.232.1105 172.20.60.44.33456 FIN_WAIT_1 fe2e3a0 TCP 0 0 172.23.92.232.1103 172.20.60.44.33456 FIN_WAIT_1 fe2e4a8 TCP 0 0 172.23.92.232.1099 172.20.60.44.33456 FIN_WAIT_1 fe2e190 TCP 0 2048 172.23.92.232.1097 172.20.37.21.43091 FIN_WAIT_1 fe2e73c TCP 0 0 172.23.92.232.1106 172.20.37.21.43091 FIN_WAIT_1 fe2e6b8 TCP 0 4096 172.23.92.232.1105 172.20.60.44.33456 FIN_WAIT_1 fe2e3a0 TCP 0 0 172.23.92.232.1103 172.20.60.44.33456 FIN_WAIT_1 fe2e4a8 TCP 0 0 172.23.92.232.1099 172.20.60.44.33456 FIN_WAIT_1 fe2e190 TCP 0 2048 172.23.92.232.1097 172.20.37.21.43091 FIN_WAIT_1 Wouldn't an ACK send the connections to FIN_WAIT_2 state? If the receiving end Recv-Q was full, and the FIN packet was stuck behind other data on the Send-Q, would the protocol still try to retransmit, or would it recognized the initial FIN was still not sent and not retransmit? > > The dump also shows some ESTABLISHED connections with data in the > > Recv-Q. The amount of data in the Recv-Q does not change over a > > period of hours, again, connections related to CORBA middle-ware. > > That one sounds like a middleware/app issue - either not checking for > data available, or ignoring data available indications. > > > Finally, there is a large amount of data in the Recv-Q of a UDP echo > > server. The amount of data fluctuates over time, but the Queue is > > never emptied. > > That one sounds a triffle sinister. Unless you know you have > applications that really need to reach-out and touch the echo service > you should probably disable it, or at least restrict it to your > organization's IPs. Particularly a UDP echo service since that one > would be more vulnerable to IP source address spoofing, allowing an > attacker to get you to send data to some other poor slob out on the > network. > > > All these partially terminated connections and non-empty data queues > > exhaust the avialable mbuf pool. > > Just how many of these things do you have and how small is your mbuf > pool? > Could the lack of mbufs prevent the echo service (used on restricted network only) and the CORBA app from taking data off the Recv-Q and processing it (or in the case of the echo server, only processing a little at a time as mbufs were freed slowly)? Looks like the 64 byte clusters (400 allocated) are being exhausted first. In normal or even heavy operations, the data clusters are very conservative - only in this bizarre case does it become a problem. Note - mbufShow dump lists number of times failed to find space: 215141 number of times waited for space: 0 number of times drained protocols for space: 187039. What happens when the protocols are drained for space? Thanks for any info. > rick jones > -- > a wide gulf separates "what if" from "if only" > these opinions are mine, all mine; HP might not want them anyway... ![]() > feel free to post, OR email to rick.jones2 in hp.com but NOT BOTH... |
|
|
|
#4 |
|
Messages: n/a
Hébergeur: |
> First, the OS is VxWorks and its network stack is based on BSD4.3.
What is the retransmission limit in the VxWorks stack? > I don't know if the data is the same, but the size of the data in > the Send-Q is the same. Yes, its the same connection that is stuck > in the FIN_WAIT_1 state. See partial output from inetstatShow > command at times 17:06, 17:50 and 20:43, respectively. > fe2e73c TCP 0 0 172.23.92.232.1106 172.20.37.21.43091 > FIN_WAIT_1 > fe2e6b8 TCP 0 4096 172.23.92.232.1105 172.20.60.44.33456 > FIN_WAIT_1 > fe2e3a0 TCP 0 0 172.23.92.232.1103 172.20.60.44.33456 > FIN_WAIT_1 > fe2e4a8 TCP 0 0 172.23.92.232.1099 172.20.60.44.33456 > FIN_WAIT_1 > fe2e190 TCP 0 2048 172.23.92.232.1097 172.20.37.21.43091 > FIN_WAIT_1 > fe2e73c TCP 0 0 172.23.92.232.1106 172.20.37.21.43091 > FIN_WAIT_1 > fe2e6b8 TCP 0 4096 172.23.92.232.1105 172.20.60.44.33456 > FIN_WAIT_1 > fe2e3a0 TCP 0 0 172.23.92.232.1103 172.20.60.44.33456 > FIN_WAIT_1 > fe2e4a8 TCP 0 0 172.23.92.232.1099 172.20.60.44.33456 > FIN_WAIT_1 > fe2e190 TCP 0 2048 172.23.92.232.1097 172.20.37.21.43091 > FIN_WAIT_1 > fe2e73c TCP 0 0 172.23.92.232.1106 172.20.37.21.43091 > FIN_WAIT_1 > fe2e6b8 TCP 0 4096 172.23.92.232.1105 172.20.60.44.33456 > FIN_WAIT_1 > fe2e3a0 TCP 0 0 172.23.92.232.1103 172.20.60.44.33456 > FIN_WAIT_1 > fe2e4a8 TCP 0 0 172.23.92.232.1099 172.20.60.44.33456 > FIN_WAIT_1 > fe2e190 TCP 0 2048 172.23.92.232.1097 172.20.37.21.43091 > FIN_WAIT_1 > Wouldn't an ACK send the connections to FIN_WAIT_2 state? An ACK of the FIN's sequence number would, but not of just the data. Are the remote's also VxWorks or some other stack? > If the receiving end Recv-Q was full, and the FIN packet was stuck > behind other data on the Send-Q, would the protocol still try to > retransmit, or would it recognized the initial FIN was still not sent > and not retransmit? TCP _should_ be retransmitting, starting from send unacknowledged. Depending on the MTU and thus the MSS of those stuck connections the FIN bit may or may not be set on the retransmitted segment. The ones with 0's in the Send-Q I would expect to be retransmitting a bare FIN. The others, if the MSS is 1460 I would be expecting to retransmit a 1460 byte segment with no FIN set. > Could the lack of mbufs prevent the echo service (used on restricted > network only) and the CORBA app from taking data off the Recv-Q and > processing it (or in the case of the echo server, only processing a > little at a time as mbufs were freed slowly)? Data waiting to be taken-out by either echo or CORBA is already in an mbuf and wouldn't (at least I wouldn't expect it to) require additional mbufs, only that echo/COBRA have target buffers for the copy. > Looks like the 64 byte clusters (400 allocated) are being exhausted > first. In normal or even heavy operations, the data clusters are > very conservative - only in this bizarre case does it become a > problem. > Note - mbufShow dump lists > number of times failed to find space: 215141 > number of times waited for space: 0 > number of times drained protocols for space: 187039. > What happens when the protocols are drained for space? That depends on the stack. For a receiver, he could discard anything he received but had not yet ACKed. For a sender I don't think there is much he could do. rick jones is very glad that HP-UX 10 and later networking doesn't have a separate mbuf pool, it got rid of a _lot_ of tuning quesitons/problems. -- Wisdom Teeth are impacted, people are affected by the effects of events. these opinions are mine, all mine; HP might not want them anyway... ![]() feel free to post, OR email to rick.jones2 in hp.com but NOT BOTH... |
|
|
|
#5 |
|
Messages: n/a
Hébergeur: |
FYI -
I was able to reproduce the FIN_WAIT_1 state problem. I flooded the EchoServer with echo requests (the server sleeps 0.5 seconds between replies), thereby exhausting the mbuf pool (a lot of SONAME and DATA). The corba middleware connection subsequently closes its connection because of ping failures. The network stack places the TCP connection into the FIN_WAIT_1 state, but never issues the FIN packet (no mbufs to do the job!). The stack either doesn't know the FIN packet was not sent or does not handle the error properly (maybe terminate connection immediately?). Since there was never an initial transmission, the network task, responsible for retransmission, attempts no retransmissions and thus the connection is never retransmitted timed-out. |
|
![]() |
| Outils de la discussion | |
|
|