Thread (6 messages) 6 messages, 4 authors, 2009-06-27

Re: [Bugme-new] [Bug 13617] New: GRO:__napi_complete from net_rx_action crash

From: David Miller <davem@davemloft.net>
Date: 2009-06-26 17:24:54

From: Dhananjay Phadke <redacted>
Date: Fri, 26 Jun 2009 10:13:59 -0700
mea culpa, likely driver can wait more for rx to drain
so that we race with napi disable.

Although, I have question for Dave. If napi code is
anyway forcing napi completion, should it not flush
gro flows also? This code predates GRO.
I think there are some reasons, but Herbert Xu is more likely
to remember than I am, CC:'d :-)
Andrew Morton wrote:
quoted
(switched to email.  Please respond via emailed reply-to-all, not via the
bugzilla web interface).


netdev core crashed.  The netxen driver may be implicated.


Why did amit@netxen.com create this bug report?  Isn't Dhananjay
sitting in the next cube?  Perhaps you believe that the driver is OK
and that the bug lies in the netdev core?



On Thu, 25 Jun 2009 06:55:14 GMT
bugzilla-daemon@bugzilla.kernel.org wrote:
quoted
http://bugzilla.kernel.org/show_bug.cgi?id=13617

           Summary: GRO:__napi_complete from net_rx_action crash
           Product: Drivers
           Version: 2.5
          Platform: All
        OS/Version: Linux
              Tree: Mainline
            Status: NEW
          Severity: normal
          Priority: P1
         Component: Network
        AssignedTo: drivers_network@kernel-bugs.osdl.org
        ReportedBy: amit@netxen.com
        Regression: No


In net_rx_action, there is check if napi_disable_pending then call
__napi_complete.
In __napi_complete, there is BUG_ON(n->gro_list);
Which has hit in below bug dump.
Why __napi_complete is called from net_rx_action instead of napi_complete.
napi_complete flushes the gro list.

Below code excerpt from net_rx_action 
http://lxr.linux.no/linux+v2.6.30/net/core/dev.c#L2736

   if (unlikely(work == weight)) {
2791       if (unlikely(napi_disable_pending(n)))
2792              __napi_complete(n);
2793        else
2794              list_move_tail(&n->poll_list, list);
2795   }

------------[ cut here ]------------
kernel BUG at net/core/dev.c:2672!
invalid opcode: 0000 [#1] SMP 
last sysfs file: /sys/devices/system/cpu/cpu3/cache/index2/shared_cpu_map
CPU 2 
Modules linked in: netxen_nic nfs lockd nfs_acl auth_rpcgss ipv6 deflate
zlib_deflate ctr twofish twofish_common serpent blowfish des_generic cbc
aes_x86_64 aes_generic xcbc sha256_generic md5 crypto_null af_key autofs4
sunrpc iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi dm_mirror
dm_region_hash dm_log dm_multipath dm_mod video output sbs sbshc pci_slot
battery acpi_memhotplug ac parport ipmi_devintf ide_cd_mod rtc_cmos bnx2 cdrom
serio_raw ipmi_si rtc_core button ipmi_msghandler iTCO_wdt rtc_lib shpchp hpilo
hpwdt i5000_edac pcspkr edac_core ata_piix libata sd_mod scsi_mod cciss ext3
jbd uhci_hcd ohci_hcd ehci_hcd [last unloaded: microcode]
Pid: 0, comm: swapper Tainted: G        W  2.6.30 #1 ProLiant DL380 G5
RIP: 0010:[<ffffffff8043b128>]  [<ffffffff8043b128>] __napi_complete+0x15/0x25
RSP: 0018:ffff880028139eb0  EFLAGS: 00010086
RAX: ffff88023d4056b8 RBX: ffff88023d4056a8 RCX: 0000000002202318
RDX: 00000000001b0000 RSI: ffff880028139d98 RDI: ffff88023d4056a8
RBP: 0000000000000080 R08: 0000000002200000 R09: 000006de15931680
R10: ffffc20011a32318 R11: 0000000000000005 R12: 0000000000000000
R13: ffff8800281440e0 R14: 0000000000000080 R15: 000000000000012c
FS:  0000000000000000(0000) GS:ffff880028136000(0000) knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 00000000008cb530 CR3: 000000023d9ab000 CR4: 00000000000006e0
Jun 23 23:41:27 DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process swapper (pid: 0, threadinfo ffff88023ed28000, task ffff88023ed27570)
Stack:
 ffff88023d4056a8 ffffffff8043ec9f 0000000000000001dut4146 last mes
000000010004f429
 ffff88023d4056b8sage repeated 6  0000000000000046 0000000000000001
0000000000000100times
Jun 23 23
 ffffffff8069a098 0000000000000018 000000000000000a:41:32 dut4146 k
ffffffff8023eba6
ernel: BUG: scheCall Trace:
duling while ato <IRQ> <0>mic: swapper/0/0 [<ffffffff8043ec9f>] ?
net_rx_action+0xf0/0x162
 [<ffffffff8023eba6>] ? __do_softirq+0xa3/0x163
 [<ffffffff8020ca7c>] ? call_softirq+0x1c/0x28
x10000100
Jun 2 [<ffffffff8020dc1a>] ? do_softirq+0x2c/0x68
 [<ffffffff8023eac6>] ? irq_exit+0x3f/0x7c
 [<ffffffff8020d46b>] ? do_IRQ+0xa9/0xbf
3 23:41:32 dut41 [<ffffffff8020c353>] ? ret_from_intr+0x0/0xa
 <EOI> 46 kernel: Modul<0> [<ffffffff80220e41>] ?
hpet_legacy_next_event+0x0/0x7
es linked in: ne [<ffffffff80386e2c>] ? acpi_hw_register_read+0x52/0xe5
 [<ffffffff80394b2a>] ? acpi_idle_enter_simple+0x120/0x14e
 [<ffffffff80394b20>] ? acpi_idle_enter_simple+0x116/0x14e
 [<ffffffff8039486b>] ? acpi_idle_enter_bm+0xd5/0x274
 [<ffffffff8041c020>] ? cpuidle_idle_call+0x7f/0xbb
 [<ffffffff8020aaa5>] ? cpu_idle+0x4a/0x6d
Code: txen_nic nfs loc48 8d kd nfs_acl auth_43 70 48 rpcgss ipv6 defl39 c2 ate
zlib_deflate0f  ctr twofish two18 0e 75 fish_common serpdf ent blowfish des31
c9 41 _generic cbc aes58 5b _x86_64 aes_gene5d 48 89 ric xcbc sha256_c8 c3
generic md5 cryp53 f6 to_null af_key a47 10 01 utofs4 sunrpc is48 89 fb csi_tcp
libiscsi75 04 _tcp libiscsi sc0f 0b eb si_transport_iscfe 48 83 si dm_mirror
dm_7f 50 region_hash dm_l00 74 04 og dm_multipath <0f> dm_mod video out0b eb
put sbs sbshc pcfe e8 i_slot battery a1f cpi_memhotplug a10 f1 ff c parport
ipmi_df0 80 evintf ide_cd_mo63 10 fe d rtc_cmos bnx2 5b c3 cdrom serio_raw 53
48 89 ipmi_si rtc_corefb e8 
 button ipmi_msgRIP  [<ffffffff8043b128>] __napi_complete+0x15/0x25
 RSP <ffff880028139eb0>
---[ end trace 9c6b22b26aefd1b1 ]---
handler iTCO_wdtKernel panic - not syncing: Fatal exception in interrupt
Pid: 0, comm: swapper Tainted: G      D W  2.6.30 #1
Call Trace:
 <IRQ>  [<ffffffff8023a3b5>] ? panic+0x86/0x134
 [<ffffffff8020e348>] ? show_registers+0x211/0x21d
 [<ffffffff8024f5ea>] ? up+0xe/0x36
 [<ffffffff8023a9db>] ? release_console_sem+0x174/0x18e
 [<ffffffff804bdd54>] ? oops_end+0xa0/0xad
 [<ffffffff8020cf2c>] ? do_invalid_op+0x85/0x8f
 [<ffffffff8043b128>] ? __napi_complete+0x15/0x25
 [<ffffffffa03ebfe2>] ? netxen_nic_hw_write_wx_2M+0x24/0xa8 [netxen_nic]
 [<ffffffffa03ef866>] ? netxen_process_rcv_ring+0x4eb/0x501 [netxen_nic]
 rtc_lib shpchp  [<ffffffff8020c715>] ? invalid_op+0x15/0x20
 [<ffffffff8043b128>] ? __napi_complete+0x15/0x25
 [<ffffffff8043ec9f>] ? net_rx_action+0xf0/0x162
 [<ffffffff8023eba6>] ? __do_softirq+0xa3/0x163
 [<ffffffff8020ca7c>] ? call_softirq+0x1c/0x28
 [<ffffffff8020dc1a>] ? do_softirq+0x2c/0x68
 [<ffffffff8023eac6>] ? irq_exit+0x3f/0x7c
 [<ffffffff8020d46b>] ? do_IRQ+0xa9/0xbf
 [<ffffffff8020c353>] ? ret_from_intr+0x0/0xa
 <EOI>  [<ffffffff80220e41>] ? hpet_legacy_next_event+0x0/0x7
 [<ffffffff80386e2c>] ? acpi_hw_register_read+0x52/0xe5
 [<ffffffff80394b2a>] ? acpi_idle_enter_simple+0x120/0x14e
 [<ffffffff80394b20>] ? acpi_idle_enter_simple+0x116/0x14e
 [<ffffffff8039486b>] ? acpi_idle_enter_bm+0xd5/0x274
 [<ffffffff8041c020>] ? cpuidle_idle_call+0x7f/0xbb
 [<ffffffff8020aaa5>] ? cpu_idle+0x4a/0x6d

-- 
Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.
Checked by AVG - www.avg.com 
Version: 8.5.374 / Virus Database: 270.12.91/2201 - Release Date: 06/25/09 17:58:00
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help