Thread (4 messages) 4 messages, 3 authors, 2009-02-28

Re: Connection reset by peer - need a patch

From: Ilpo Järvinen <hidden>
Date: 2009-02-28 10:03:14

On Fri, 27 Feb 2009, Pascal GREGIS wrote:
I have a very annoying bug that seems to be well known today.
It happens on a backup server that issue a "Connection reset by peer" 
while the other side does not reset or stop the connection. 
If I understand you correct you lost synchronization between hosts...
If so I'd suggest you start tracking what's getting dropped/discarded and 
where (it might affect only a single direction). Any middlebox is outright 
a suspect :-). Tcpdump (on both host, and possibly on intermediate nodes 
ifaces if losses in between are found from the end host tcpdumps), mibs 
(/proc/net/netstat, for in-host discards) and strace are there to help 
you onward. Even if it's not lost syncronization you basically use 
the same tools.
I have found a report that seems very similar on this mailing list :
http://kerneltrap.org/index.php?q=mailarchive/linux-netdev/2008/4/28/1628834

this sends to the commit 7951f0b03a63d657c72c7d54d306ef3357e7e604
Author: Daniel Lezcano <...
Date: Thu Apr 10 20:53:10 2008 -0700
    [NETNS][IPV6] tcp - assign the netns for timewait sockets

and gives a simple patch that adds the line 
    tw->tw_net = sk->sk_net;
somewhere in the function inet_twsk_alloc (in the file net/ipv4/inet_timewait_sock.c).
I don't think you're on a right track with that lead...
Right, but the problem is that I use a kernel 2.6.21.1 and cannot 
upgrade my whole kernel easily. And in the kernel 2.6.21.1, network 
namespaces don't seem to exist, so I cannot apply this simple patch.
...as that bug was introduced along with network namespaces, so for sure 
you won't need that fix for anything that doesn't have them. <update>Ah,
DaveM already told that</update>.
I am in a very uncomfortable situation because this bug is causing 
harmfull problems on all the backup servers of my company and as I said 
above, upgrading the kernel is not really possible at this time. > 
Does anyone know what I could do to solve this ?
Unfortunately it's a bit same for us as we don't act as a support for 
random, ancient kernels (if it's a distro kernel you can probably ask 
them but basically you'll need more information that what's available in 
this mail to actually solve the problem)... I gave you some directions how 
these problems are located in general regardless of kernel version.

On the other hand, please don't hesitate to report/ask from us if you have 
a recent enough kernel and encounter some problems.


-- 
 i.
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help