Re: ax25 rose Re: kernel panic linux-2.6.27-rc7
From: Jarek Poplawski <hidden>
Date: 2008-10-05 13:02:15
On Sat, Oct 04, 2008 at 08:30:26PM +0200, Bernard, f6bvp wrote:
Jarek, Following your indications I did it both ways ! Without ???commit 30902dc3cb0ea1cfc7ac2b17bcf478ff98420d74 patch kernel-2.6.27-rc7 is no longer subject to kernel panic when running ROSE applications. Reversely, when this patch is applied to rose-patched 2.6.25.10 kernel, this one reboots a few seconds after ROSE application are started. Otherwise it is very stable. I checked about three times this behaviour for both kernels with and without the incriminated patch. This confirms without doubt that it is responsible of observed kernel panic. Is there however a possibility to find a solution to cure the problem this patch was dedicated to ? Bernard
I've looked at this a bit and here are some conclusions: I think this David's patch should be reverted: it's probably colliding currently with ax25_disconnect(), and there could be double destroying or something. Since I don't know this code enough, I'm not going to look now for the cleanest possible solution. I'd only like to mention that this "/* Magic here: If we listen()..." is still left in a few other places (ax25, rose, netrom, x25), so removing this one isn't too consistent. Anyway it looks like this original hack: http://marc.info/?l=linux-netdev&m=121370472223572&w=2 could be just the missing part of this magic (or I miss something). Bernard, since it worked for the author I propose to test if it's OK to you. If so - why bother with more? (Unless somebody cares...) BTW, as I wrote before, it would be nice to check this with the first debugging patch I sent, to check the difference. Thanks, Jarek P.
Le vendredi 03 octobre 2008 ?? 07:43 +0000, Jarek Poplawski a écrit :quoted
On Fri, Oct 03, 2008 at 07:34:18AM +0000, Jarek Poplawski wrote:quoted
On 02-10-2008 21:48, Jarek Poplawski wrote:quoted
On Thu, Oct 02, 2008 at 08:20:18PM +0200, Bernard, f6bvp wrote:...quoted
quoted
Although I did not change anything, and contrarily to my previous observation, the system instability as shown above occurs systematically. There was no problem with Kernel 2.6.25-10 I was using before (with patches for AX25 and ROSE that are now included in 2.6.27-rc7).Then it could be useful to try our luck with reverting some other "suspicious" changes added in the meantime. My first candidate is attached below. (So you could test this with vanilla 2.6.27-rc7 or later, with or without any of the patches in this thread, and the patch below reverted.)Hmm... Of course, you could do this other way as well: 2.6.25-10 etc. with this patch applied. Jarek P.quoted
quoted
quoted
I did not try 2.6.26 on this machine, thus I cannot tell if the bug was already present. Would it be worth to test 2.6.26 ?Yes, but only if you think you can do it safely.This is still valid (it can wait). Jarek P. --------> commit 30902dc3cb0ea1cfc7ac2b17bcf478ff98420d74 Author: David S. Miller [off-list ref] Date: Tue Jun 17 21:26:37 2008 -0700 ax25: Fix std timer socket destroy handling. Tihomir Heidelberg - 9a4gl, reports: -------------------- I would like to direct you attention to one problem existing in ax.25 kernel since 2.4. If listening socket is closed and its SKB queue is released but those sockets get weird. Those "unAccepted()" sockets should be destroyed in ax25_std_heartbeat_expiry, but it will not happen. And there is also a note about that in ax25_std_timer.c: /* Magic here: If we listen() and a new link dies before it is accepted() it isn't 'dead' so doesn't get removed. */ This issue cause ax25d to stop accepting new connections and I had to restarted ax25d approximately each day and my services were unavailable. Also netstat -n -l shows invalid source and device for those listening sockets. It is strange why ax25d's listening socket get weird because of this issue, but definitely when I solved this bug I do not have problems with ax25d anymore and my ax25d can run for months without problems. -------------------- Actually as far as I can see, this problem is even in releases as far back as 2.2.x as well. It seems senseless to special case this test on TCP_LISTEN state. Anything still stuck in state 0 has no external references and we can just simply kill it off directly. Signed-off-by: David S. Miller [off-list ref]diff --git a/net/ax25/ax25_std_timer.c b/net/ax25/ax25_std_timer.c index 96e4b92..cdc7e75 100644 --- a/net/ax25/ax25_std_timer.c +++ b/net/ax25/ax25_std_timer.c@@ -39,11 +39,9 @@ void ax25_std_heartbeat_expiry(ax25_cb *ax25) switch (ax25->state) { case AX25_STATE_0: - /* Magic here: If we listen() and a new link dies before it - is accepted() it isn't 'dead' so doesn't get removed. */ - if (!sk || sock_flag(sk, SOCK_DESTROY) || - (sk->sk_state == TCP_LISTEN && - sock_flag(sk, SOCK_DEAD))) { + if (!sk || + sock_flag(sk, SOCK_DESTROY) || + sock_flag(sk, SOCK_DEAD)) { if (sk) { sock_hold(sk); ax25_destroy_socket(ax25);