Re: LOCKDEP complaints in l2tp_xmit_skb()
From: Eric Dumazet <hidden>
Date: 2012-06-28 15:00:46
On Thu, 2012-06-28 at 15:33 +0100, Tom Parkin wrote:
On Thu, Jun 28, 2012 at 01:22:31PM +0200, Eric Dumazet wrote:quoted
On Thu, 2012-06-28 at 10:57 +0200, Eric Dumazet wrote:quoted
On Thu, 2012-06-28 at 08:56 +0200, Eric Dumazet wrote:quoted
[PATCH] net: Qdisc busylock gets its own lockdep class Tom Parkin reported following LOCKDEP splat :..quoted
Instruct lockdep that each Qdisc busylock is independant, or else bonding or various tunnels can trigger a splat. Reported-by: Tom Parkin <redacted> Signed-off-by: Eric Dumazet <edumazet@google.com> ---I reproduced the bug using a bond0 device, adding a qdisc on it, (one Qdisc on bond0, and a Qdisc on the slave too) Problem with this patch is I have following message : BUG: key ffff88..... not in .data! No more LOCKDEP splat, but patch not good as is.I tested the alternative following patch with my bonding setup, could you test it with l2tp ?I've tested against my l2tp test configuration and I still see LOCKDEP splats: 2tp_core: L2TP core driver, V2.0 2tp_netlink: L2TP netlink interface 2tp_eth: L2TP ethernet pseudowire support (L2TPv3) ============================================ INFO: possible recursive locking detected ] .5.0-rc2-net-lockdep-u64-sync-007-+ #1 Not tainted -------------------------------------------- wapper/0/0 is trying to acquire lock: (slock-AF_INET){+.-...}, at: [<f862abff>] l2tp_xmit_skb+0x13f/0x8e0 [l2tp_core] ut task is already holding lock: (slock-AF_INET){+.-...}, at: [<c15333d7>] ip_send_reply+0x107/0x2b0 ther info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(slock-AF_INET); lock(slock-AF_INET); *** DEADLOCK *** May be due to missing lock nesting notation locks held by swapper/0/0: #0: (rcu_read_lock){.+.+..}, at: [<c14f7824>] __netif_receive_skb+0xe4/0x8d0 #1: (rcu_read_lock){.+.+..}, at: [<c152bd4c>] ip_local_deliver_finish+0x3c/0x4c0 #2: (slock-AF_INET){+.-...}, at: [<c15333d7>] ip_send_reply+0x107/0x2b0 #3: (rcu_read_lock){.+.+..}, at: [<c1531456>] ip_finish_output+0x106/0x710 #4: (rcu_read_lock_bh){.+....}, at: [<c14fa670>] dev_queue_xmit+0x0/0xbd0 tack backtrace: id: 0, comm: swapper/0 Not tainted 3.5.0-rc2-net-lockdep-u64-sync-007-+ #1 all Trace: [<c10a7b32>] __lock_acquire+0xd52/0x17d0 [<c10a334b>] ? trace_hardirqs_off+0xb/0x10 [<c10a8b48>] lock_acquire+0x88/0x120 [<f862abff>] ? l2tp_xmit_skb+0x13f/0x8e0 [l2tp_core] [<c16157bb>] _raw_spin_lock+0x3b/0x70 [<f862abff>] ? l2tp_xmit_skb+0x13f/0x8e0 [l2tp_core] [<f862abff>] l2tp_xmit_skb+0x13f/0x8e0 [l2tp_core] [<f851a32d>] l2tp_eth_dev_xmit+0x2d/0x40 [l2tp_eth] [<c14fa32f>] dev_hard_start_xmit+0x49f/0x7e0 [<c14f9ee1>] ? dev_hard_start_xmit+0x51/0x7e0 [<c1515819>] sch_direct_xmit+0xa9/0x250 [<c16157e1>] ? _raw_spin_lock+0x61/0x70 [<c14fa83f>] dev_queue_xmit+0x1cf/0xbd0 [<c14fa670>] ? dev_hard_start_xmit+0x7e0/0x7e0 [<c1531537>] ip_finish_output+0x1e7/0x710 [<c1531456>] ? ip_finish_output+0x106/0x710 [<c1532770>] ? ip_output+0x60/0x120 [<c10a585b>] ? trace_hardirqs_on+0xb/0x10 [<c153278b>] ip_output+0x7b/0x120 [<c1532fc9>] ? __ip_make_skb+0x229/0x360 [<c1531b95>] ip_local_out+0x25/0x80 [<c1533117>] ip_send_skb+0x17/0x70 [<c153319b>] ip_push_pending_frames+0x2b/0x40 [<c1533495>] ip_send_reply+0x1c5/0x2b0 [<c107efef>] ? sched_clock_cpu+0xcf/0x150 [<c154f253>] tcp_v4_send_ack+0x1a3/0x260 [<c1552430>] ? tcp_timewait_state_process+0x90/0x3c0 [<c15514ef>] tcp_v4_rcv+0x3ff/0xc20 [<c152bdff>] ip_local_deliver_finish+0xef/0x4c0 [<c152bd4c>] ? ip_local_deliver_finish+0x3c/0x4c0 [<c152c40f>] ip_local_deliver+0x3f/0x80 [<c152b844>] ip_rcv_finish+0x174/0x640 [<c152c671>] ip_rcv+0x221/0x320 [<c14f7f11>] __netif_receive_skb+0x7d1/0x8d0 [<c14f7824>] ? __netif_receive_skb+0xe4/0x8d0 [<c14f80b7>] process_backlog+0xa7/0x170 [<c14f88dd>] net_rx_action+0x11d/0x210 [<c104d990>] ? local_bh_enable_ip+0xd0/0xd0 [<c104da27>] __do_softirq+0x97/0x1f0 [<c104d990>] ? local_bh_enable_ip+0xd0/0xd0 <IRQ> [<c104ddce>] ? irq_exit+0x7e/0xa0 [<c161e02b>] ? do_IRQ+0x4b/0xc0 [<c161de75>] ? common_interrupt+0x35/0x3c [<c10380d5>] ? native_safe_halt+0x5/0x10 [<c1018bdf>] ? default_idle+0x4f/0x1e0 [<c1018dc1>] ? amd_e400_idle+0x51/0x100 [<c10199c9>] ? cpu_idle+0xb9/0xe0 [<c15eab3e>] ? rest_init+0x112/0x124 [<c15eaa2c>] ? __read_lock_failed+0x14/0x14 [<c1907a11>] ? start_kernel+0x376/0x37c [<c19074d6>] ? repair_env_string+0x51/0x51 [<c19072f8>] ? i386_start_kernel+0x9b/0xa2
Yes, but this is not the splat I fixed. You reported two different splats, didnt you ? I fixed a core network issue, I am sure guys who wrote l2tp can fix the l2tp one ;)