Re: [PANIC] lro + iscsi or lro + skb text search causes panic

From: Mike Christie <hidden>
Date: 2009-01-28 18:23:46

Shyam_Iyer@Dell.com wrote:

Mike Christie wrote:

quoted

Herbert Xu wrote:

quoted

On Sun, Jan 25, 2009 at 09:32:22PM -0800, David Miller wrote:

quoted

From: Mike Christie <redacted>
Date: Thu, 22 Jan 2009 18:04:11 -0600

quoted

With the patch running against linus's git tree, my box locks up. 
You cannot ping it. I do not get a oops or anything in the logs,

and

quoted

the keyboard does not respond. I will try to get some oops output 
and more info.

Herbert, any idea offhand?

Yeah, I missed an offset update in there :) Here's a better version.

net: Fix frag_list handling in skb_seq_read

The frag_list handling was broken in skb_seq_read:

1) We didn't add the stepped offset when looking at the head are of 
fragments other than the first.

2) We didn't take the stepped offset away when setting the data 
pointer in the head area.

3) The frag index wasn't reset.

This patch fixes both issues.

quoted

It oopsd for me in skb_seq_read. addr2line said it was

linux-2.6/net/core/skbuff.c:2228, which is this line:

quoted

while (st->frag_idx < skb_shinfo(st->cur_skb)->nr_frags) {

quoted

I added some printks in there and it looks like we hit this:

quoted

        } else if (st->root_skb == st->cur_skb &&
                  skb_shinfo(st->root_skb)->frag_list) {
                st->cur_skb = skb_shinfo(st->root_skb)->frag_list;
                st->frag_idx = 0;
                goto next_skb;
       }



Actually I did some testing and added a few printks and found that the
st->cur_skb->data was 0 and hence the ptr used by iscsi_tcp was null.
This caused the kernel panic.




Yeah, that is what Jesse saw too. I never got a null ptr though. That is 
probably why he oopsed, but I could login but would get what looked like 
corrupted packets instead of oopsing.

 	if (abs_offset < block_limit) {
-		*data = st->cur_skb->data + abs_offset;
+		*data = st->cur_skb->data + (abs_offset -
st->stepped_offset);

 I enabled the debug_tcp and with a few printks found that the code in
my scenario did not go to the next_skb label as described by Mike and
could find that the sequence being followed was this -

It hit this if condition -
 if (st->cur_skb->next) {
                st->cur_skb = st->cur_skb->next;
                st->frag_idx = 0;
                goto next_skb;



Yeah, I must have been cross eyed that night. I was hitting this too, 
and that caused me to hit the loop again.

quoted hunk ↗ jump to hunk

And so, now the st pointer is shifted to the next skb whereas actually
it should have hit the second else if first since the data is in the
frag_list.

else if (st->root_skb == st->cur_skb &&
                   skb_shinfo(st->root_skb)->frag_list) {
                st->cur_skb = skb_shinfo(st->root_skb)->frag_list;
                goto next_skb;
        }


Reversing the two conditions the attached patch fixes the issue for me
on top of Herbert's patches.  I have done the testing on the ixgbe
adapter itself and verified the fix using some amount of data transfer
as well.

Signed-off-by: Shyam Iyer <redacted>

--- skbuff.c.orig	2009-01-29 01:12:03.000000000 +0530
+++ skbuff.c	2009-01-29 01:34:57.000000000 +0530

@@ -2039,15 +2039,15 @@
 		st->frag_data = NULL;
 	
 
-	if (st->cur_skb->next) {
-		st->cur_skb = st->cur_skb->next;
-		st->frag_idx = 0;
-		goto next_skb;
-	} else if (st->root_skb == st->cur_skb &&
+	if (st->root_skb == st->cur_skb &&
 		   skb_shinfo(st->root_skb)->frag_list) {
 		st->cur_skb = skb_shinfo(st->root_skb)->frag_list;
 		st->frag_idx=0;


I think you diffed this against the wrong source. In the upstream code 
this is
st->frag_idx = 0;

so the patch does not apply.

 		goto next_skb;
+	} else if (st->cur_skb->next) {
+		st->cur_skb = st->cur_skb->next;
+		st->frag_idx = 0;
+		goto next_skb;
 	}
 
 	return 0;

Nice catch. Patch works for me. Thanks!

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help