Re: PROBLEM: reproducible crash KVM+nf_conntrack all recent 2.6 kernels
From: Jon Masters <hidden>
Date: 2010-01-28 07:20:24
Also in:
lkml, netfilter-devel
On Thu, 2010-01-28 at 00:46 -0500, Jon Masters wrote:
A number of people seem to have reported this crash in various forms, but I have yet to see a solution, and can reproduce on 2.6.33-rc5 this evening so I know it's still present in the latest upstream kernels too. Userspace is Fedora 12, and this happens on both all recent F12 kernels (sporadic in 2.6.31 until recently, solidly reproducible on 2.6.32) and upstream 2.6.32, and 2.6.33-rc5 also - hard to find a "known good". The problem happens when using netfilter with KVM (problem does not occur without the firewall loaded, for example) and will occur within a few minutes of attempting to start or stop a guest that is connecting to the network - the easiest way to reproduce so far is simply to start up a bunch of Fedora guests and have them do a "yum update" cycle. All of the crashes appear similar to the following (2.6.33-rc5):
Rebuilt the kernel with all debug options turned on, got some lockdep
warnings (haven't looked further yet). Here's the output (attached full
boot log also):
[ 205.843071]
[ 205.843073] =======================================================
[ 205.843769] [ INFO: possible circular locking dependency detected ]
[ 205.843769] 2.6.33-rc5 #3
[ 205.843769] -------------------------------------------------------
[ 205.843769] libvirtd/3058 is trying to acquire lock:
[ 205.843769] (s_active){++++.+}, at: [<ffffffff81171a56>]
sysfs_addrm_finish+0x36/0x55
[ 205.843769]
[ 205.843769] but task is already holding lock:
[ 205.843769] (rtnl_mutex){+.+.+.}, at: [<ffffffff813cedd9>] rtnl_lock
+0x17/0x19
[ 205.843769]
[ 205.843769] which lock already depends on the new lock.
[ 205.843769]
[ 205.843769]
[ 205.843769] the existing dependency chain (in reverse order) is:
[ 205.843769]
[ 205.843769] -> #4 (rtnl_mutex){+.+.+.}:
[ 205.843769] [<ffffffff8107ea94>] __lock_acquire+0xb84/0xd3c
[ 205.843769] [<ffffffff8107ed19>] lock_acquire+0xcd/0xf1
[ 205.843769] [<ffffffff8145504b>] __mutex_lock_common
+0x4b/0x392
[ 205.843769] [<ffffffff81455456>] mutex_lock_nested+0x3e/0x43
[ 205.843769] [<ffffffff813cedd9>] rtnl_lock+0x17/0x19
[ 205.843769] [<ffffffff813d05c0>] linkwatch_event+0xe/0x2c
[ 205.843769] [<ffffffff8106713f>] worker_thread+0x266/0x35f
[ 205.843769] [<ffffffff8106b1d0>] kthread+0x9a/0xa2
[ 205.843769] [<ffffffff8100aaa4>] kernel_thread_helper+0x4/0x10
[ 205.843769]
[ 205.843769] -> #3 ((linkwatch_work).work){+.+.+.}:
[ 205.843769] [<ffffffff8107ea94>] __lock_acquire+0xb84/0xd3c
[ 205.843769] [<ffffffff8107ed19>] lock_acquire+0xcd/0xf1
[ 205.843769] [<ffffffff81067136>] worker_thread+0x25d/0x35f
[ 205.843769] [<ffffffff8106b1d0>] kthread+0x9a/0xa2
[ 205.843769] [<ffffffff8100aaa4>] kernel_thread_helper+0x4/0x10
[ 205.843769]
[ 205.843769] -> #2 (events){+.+.+.}:
[ 205.843769] [<ffffffff8107ea94>] __lock_acquire+0xb84/0xd3c
[ 205.843769] [<ffffffff8107ed19>] lock_acquire+0xcd/0xf1
[ 205.843769] [<ffffffff81067a8b>] flush_workqueue+0x63/0xb1
[ 205.843769] [<ffffffff81067aee>] flush_scheduled_work
+0x15/0x17
[ 205.843769] [<ffffffff8138168b>] do_md_stop+0x2c0/0x508
[ 205.843769] [<ffffffff81383b94>] md_ioctl+0x92f/0xf49
[ 205.843769] [<ffffffff8121442d>] __blkdev_driver_ioctl
+0x39/0xa3
[ 205.843769] [<ffffffff81214dc8>] blkdev_ioctl+0x67d/0x6b1
[ 205.843769] [<ffffffff8113ef34>] block_ioctl+0x37/0x3b
[ 205.843769] [<ffffffff81126294>] vfs_ioctl+0x32/0xa6
[ 205.843769] [<ffffffff81126814>] do_vfs_ioctl+0x490/0x4d6
[ 205.843769] [<ffffffff811268b0>] sys_ioctl+0x56/0x79
[ 205.843769] [<ffffffff81009c32>] system_call_fastpath
+0x16/0x1b
[ 205.843769]
[ 205.843769] -> #1 (&new->reconfig_mutex){+.+.+.}:
[ 205.843769] [<ffffffff8107ea94>] __lock_acquire+0xb84/0xd3c
[ 205.843769] [<ffffffff8107ed19>] lock_acquire+0xcd/0xf1
[ 205.843769] [<ffffffff8145504b>] __mutex_lock_common
+0x4b/0x392
[ 205.843769] [<ffffffff814553d0>]
mutex_lock_interruptible_nested+0x3e/0x43
[ 205.843769] [<ffffffff8137bd62>] mddev_lock+0x17/0x19
[ 205.843769] [<ffffffff8137c039>] md_attr_show+0x32/0x5d
[ 205.843769] [<ffffffff811708dc>] sysfs_read_file+0xbd/0x17f
[ 205.843769] [<ffffffff8111983d>] vfs_read+0xab/0x108
[ 205.843769] [<ffffffff8111995a>] sys_read+0x4a/0x6e
[ 205.843769] [<ffffffff81009c32>] system_call_fastpath
+0x16/0x1b
[ 205.843769]
[ 205.843769] -> #0 (s_active){++++.+}:
[ 205.843769] [<ffffffff8107e93e>] __lock_acquire+0xa2e/0xd3c
[ 205.843769] [<ffffffff8107ed19>] lock_acquire+0xcd/0xf1
[ 205.843769] [<ffffffff81171463>] sysfs_deactivate+0x9a/0x103
[ 205.843769] [<ffffffff81171a56>] sysfs_addrm_finish+0x36/0x55
[ 205.843769] [<ffffffff81171aab>] remove_dir+0x36/0x3e
[ 205.843769] [<ffffffff81171b60>] sysfs_remove_dir+0x9d/0xbe
[ 205.843769] [<ffffffff812206c1>] kobject_del+0x16/0x37
[ 205.843769] [<ffffffff8122078a>] kobject_release+0xa8/0xf8
[ 205.843769] [<ffffffff8122165d>] kref_put+0x43/0x4d
[ 205.843769] [<ffffffff8122063c>] kobject_put+0x47/0x4b
[ 205.843769] [<ffffffffa02d5ef5>] br_sysfs_delbr+0x21/0x43
[bridge]
[ 205.843769] [<ffffffffa02d20e9>] del_br+0x5a/0x6f [bridge]
[ 205.843769] [<ffffffffa02d24b2>] br_del_bridge+0x59/0x67
[bridge]
[ 205.843769] [<ffffffffa02d3764>] br_ioctl_deviceless_stub
+0x1fb/0x21b [bridge]
[ 205.843769] [<ffffffff813b32c4>] sock_ioctl+0x141/0x20d
[ 205.843769] [<ffffffff81126294>] vfs_ioctl+0x32/0xa6
[ 205.843769] [<ffffffff81126814>] do_vfs_ioctl+0x490/0x4d6
[ 205.843769] [<ffffffff811268b0>] sys_ioctl+0x56/0x79
[ 205.843769] [<ffffffff81009c32>] system_call_fastpath
+0x16/0x1b
[ 205.843769]
[ 205.843769] other info that might help us debug this:
[ 205.843769]
[ 205.843769] 2 locks held by libvirtd/3058:
[ 205.843769] #0: (br_ioctl_mutex){+.+.+.}, at: [<ffffffff813b32a9>]
sock_ioctl+0x126/0x20d
[ 205.843769] #1: (rtnl_mutex){+.+.+.}, at: [<ffffffff813cedd9>]
rtnl_lock+0x17/0x19
[ 205.843769]
[ 205.843769] stack backtrace:
[ 205.843769] Pid: 3058, comm: libvirtd Not tainted 2.6.33-rc5 #3
[ 205.843769] Call Trace:
[ 205.843769] [<ffffffff8107dae0>] print_circular_bug+0xa8/0xb6
[ 205.843769] [<ffffffff8107e93e>] __lock_acquire+0xa2e/0xd3c
[ 205.843769] [<ffffffff81171a56>] ? sysfs_addrm_finish+0x36/0x55
[ 205.843769] [<ffffffff8107ed19>] lock_acquire+0xcd/0xf1
[ 205.843769] [<ffffffff81171a56>] ? sysfs_addrm_finish+0x36/0x55
[ 205.843769] [<ffffffff81171463>] sysfs_deactivate+0x9a/0x103
[ 205.843769] [<ffffffff81171a56>] ? sysfs_addrm_finish+0x36/0x55
[ 205.843769] [<ffffffff8107c028>] ? trace_hardirqs_off+0xd/0xf
[ 206.367937] [<ffffffff81454e6a>] ? __mutex_unlock_slowpath
+0x120/0x132
[ 206.367937] [<ffffffff81171a56>] sysfs_addrm_finish+0x36/0x55
[ 206.367937] [<ffffffff81171aab>] remove_dir+0x36/0x3e
[ 206.367937] [<ffffffff81171b60>] sysfs_remove_dir+0x9d/0xbe
[ 206.367937] [<ffffffff812206c1>] kobject_del+0x16/0x37
[ 206.367937] [<ffffffff8122078a>] kobject_release+0xa8/0xf8
[ 206.367937] [<ffffffff812206e2>] ? kobject_release+0x0/0xf8
[ 206.367937] [<ffffffff8122165d>] kref_put+0x43/0x4d
[ 206.367937] [<ffffffff8122063c>] kobject_put+0x47/0x4b
[ 206.367937] [<ffffffff8105d218>] ? del_timer_sync+0x0/0xa6
[ 206.367937] [<ffffffffa02d5ef5>] br_sysfs_delbr+0x21/0x43 [bridge]
[ 206.367937] [<ffffffffa02d20e9>] del_br+0x5a/0x6f [bridge]
[ 206.367937] [<ffffffffa02d24b2>] br_del_bridge+0x59/0x67 [bridge]
[ 206.367937] [<ffffffffa02d3764>] br_ioctl_deviceless_stub
+0x1fb/0x21b [bridge]
[ 206.367937] [<ffffffff813b32c4>] sock_ioctl+0x141/0x20d
[ 206.367937] [<ffffffff81126294>] vfs_ioctl+0x32/0xa6
[ 206.367937] [<ffffffff81126814>] do_vfs_ioctl+0x490/0x4d6
[ 206.367937] [<ffffffff811268b0>] sys_ioctl+0x56/0x79
[ 206.367937] [<ffffffff81009c32>] system_call_fastpath+0x16/0x1b
[ 206.482734] kobject: 'brif' (ffff880213a243b8): calling ktype release
[ 206.492872] kobject: (ffff880213a243b8): dynamic_kobj_release
[ 206.498841] kobject: 'brif': free name
[ 206.527833] kobject: 'virbr0' (ffff8801cf3b0500): kobject_uevent_env
[ 206.534360] kobject: 'virbr0' (ffff8801cf3b0500): fill_kobj_path:
path = '/devices/virtual/net/virbr0'
[ 206.551468] kobject: 'virbr0' (ffff8801cf3b0500): kobject_cleanup
[ 206.557611] kobject: 'virbr0' (ffff8801cf3b0500): calling ktype
release
[ 206.564358] kobject: 'virbr0': free name
[ 208.575408] kobject: 'tun' (ffffffffa044b510): kobject_add_internal:
parent: 'module', set: 'module'
[ 208.595562] kobject: 'holders' (ffff8801d352cf68):
kobject_add_internal: parent: 'tun', set: '<NULL>'
[ 208.604990] kobject: 'tun' (ffffffffa044b510): kobject_uevent_env
[ 208.611296] kobject: 'tun' (ffffffffa044b510): fill_kobj_path: path =
'/module/tun'
[ 208.620119] kobject: 'notes' (ffff8801d352c660):
kobject_add_internal: parent: 'tun', set: '<NULL>'
[ 208.630343] tun: Universal TUN/TAP device driver, 1.6
[ 208.635421] tun: (C) 1999-2004 Max Krasnyansky [off-list ref]
[ 208.641721] kobject: 'tun' (ffff8801cd319e08): kobject_add_internal:
parent: 'misc', set: 'devices'
[ 208.651194] kobject: 'tun' (ffff8801cd319e08): kobject_uevent_env
[ 208.657386] kobject: 'tun' (ffff8801cd319e08): fill_kobj_path: path =
'/devices/virtual/misc/tun'
[ 208.671916] kobject: 'vnet0' (ffff880207df35e0):
kobject_add_internal: parent: 'net', set: 'devices'
[ 208.682667] kobject: 'vnet0' (ffff880207df35e0): kobject_uevent_env
[ 208.689435] kobject: 'vnet0' (ffff880207df35e0): fill_kobj_path: path
= '/devices/virtual/net/vnet0'
[ 208.712572] device vnet0 entered promiscuous mode
[ 208.717571] kobject: 'brport' (ffff8801cd378e88):
kobject_add_internal: parent: 'vnet0', set: '<NULL>'
[ 208.728149] kobject: 'brport' (ffff8801cd378e88): kobject_uevent_env
[ 208.734561] kobject: 'brport' (ffff8801cd378e88): kobject_uevent_env:
filter function caused the event to drop!
[ 208.748356] br0: port 2(vnet0) entering forwarding state
[ 209.576245] kobject: 'vnet1' (ffff8801cc9a0500):
kobject_add_internal: parent: 'net', set: 'devices'
[ 209.587167] kobject: 'vnet1' (ffff8801cc9a0500): kobject_uevent_env
[ 209.593773] kobject: 'vnet1' (ffff8801cc9a0500): fill_kobj_path: path
= '/devices/virtual/net/vnet1'
[ 209.614211] device vnet1 entered promiscuous mode
[ 209.619898] kobject: 'brport' (ffff8801cee35308):
kobject_add_internal: parent: 'vnet1', set: '<NULL>'
[ 209.631310] kobject: 'brport' (ffff8801cee35308): kobject_uevent_env
[ 209.638303] kobject: 'brport' (ffff8801cee35308): kobject_uevent_env:
filter function caused the event to drop!
[ 209.654971] br0: port 3(vnet1) entering forwarding state
[ 210.268995] SELinux: initialized (dev mqueue, type mqueue), uses
transition SIDs
[ 210.277526] kobject: 'pid_2' (ffff88020e222128):
kobject_add_internal: parent: 'slab', set: 'slab'
[ 210.288037] kobject: 'pid_2' (ffff88020e222128): kobject_uevent_env
[ 210.294431] kobject: 'pid_2' (ffff88020e222128): fill_kobj_path: path
= '/kernel/slab/pid_2'
[ 210.304019] SELinux: initialized (dev proc, type proc), uses
genfs_contexts
[ 210.397532] SELinux: initialized (dev mqueue, type mqueue), uses
transition SIDs
[ 210.434435] lo: Disabled Privacy Extensions
[ 210.443268] SELinux: initialized (dev proc, type proc), uses
genfs_contexts
[ 210.787652] kobject: 'lo' (ffff880207b725a0): kobject_cleanup
[ 210.793444] kobject: 'lo' (ffff880207b725a0): calling ktype release
[ 210.799783] kobject: 'lo': free name
[ 218.966215] vnet0: no IPv6 routers present
[ 220.336219] vnet1: no IPv6 routers present
[ 238.799662] kobject: 'vnet2' (ffff8801ccba8500):
kobject_add_internal: parent: 'net', set: 'devices'
[ 238.811833] kobject: 'vnet2' (ffff8801ccba8500): kobject_uevent_env
[ 238.818841] kobject: 'vnet2' (ffff8801ccba8500): fill_kobj_path: path
= '/devices/virtual/net/vnet2'
[ 238.841088] device vnet2 entered promiscuous mode
[ 238.846209] kobject: 'brport' (ffff8801cf2d92d0):
kobject_add_internal: parent: 'vnet2', set: '<NULL>'
[ 238.856592] kobject: 'brport' (ffff8801cf2d92d0): kobject_uevent_env
[ 238.863088] kobject: 'brport' (ffff8801cf2d92d0): kobject_uevent_env:
filter function caused the event to drop!
[ 238.878798] br0: port 4(vnet2) entering forwarding state
[ 249.395339] vnet2: no IPv6 routers present
[ 339.729225] general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC
[ 339.730086] last sysfs
file: /sys/devices/virtual/block/md0/md/sync_speed
[ 339.730086] CPU 7
[ 339.730086] Pid: 3245, comm: qemu-kvm Not tainted 2.6.33-rc5 #3
0F9382/Precision WorkStation 490
[ 339.730086] RIP: 0010:[<ffffffff813e5f3e>] [<ffffffff813e5f3e>]
nf_ct_remove_expectations+0x49/0x5c
[ 339.730086] RSP: 0018:ffff880030603c18 EFLAGS: 00010206
[ 339.730086] RAX: ffff8801c2ea530b RBX: 009ab2000008d300 RCX:
ffffffffa02925c1
[ 339.730086] RDX: 000000000000006b RSI: ffff880030603be8 RDI:
009ab2000008d378
[ 339.730086] RBP: ffff880030603c28 R08: 0000000000000002 R09:
0000000000000000
[ 339.730086] R10: 0000000000000003 R11: ffff88019d28a268 R12:
ffffffff829a0400
[ 339.730086] R13: ffffffffa02925c1 R14: 0000000000000000 R15:
ffff8801a18e4630
[ 339.730086] FS: 00007fb6197ca780(0000) GS:ffff880030600000(0000)
knlGS:0000000000000000
[ 339.730086] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 339.730086] CR2: 00007f747a184178 CR3: 00000001badd8000 CR4:
00000000000026e0
[ 339.730086] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
[ 339.730086] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
0000000000000400
[ 339.730086] Process qemu-kvm (pid: 3245, threadinfo ffff8801b9370000,
task ffff8801d53c48a0)
[ 339.730086] Stack:
[ 339.730086] ffffffff81b045d0 ffffffff829a0400 ffff880030603c48
ffffffff813e3426
[ 339.730086] <0> ffff8801a18e4600 ffff88019d28a250 ffff880030603c68
ffffffff813e10f8
[ 339.730086] <0> ffff880030603c68 ffffffff81b045d0 ffff880030603c88
ffffffff813bc080
[ 339.730086] Call Trace:
[ 339.730086] <IRQ>
[ 339.730086] [<ffffffff813e3426>] destroy_conntrack+0x73/0x121
[ 339.730086] [<ffffffff813e10f8>] nf_conntrack_destroy+0x2c/0x33
[ 339.730086] [<ffffffff813bc080>] skb_release_head_state+0x77/0xb9
[ 339.730086] [<ffffffff813bbde2>] __kfree_skb+0x16/0x82
[ 339.730086] [<ffffffff813bbee7>] kfree_skb+0x6a/0x73
[ 339.730086] [<ffffffffa02925c1>] ip6_mc_input+0x214/0x221 [ipv6]
[ 339.730086] [<ffffffffa02925f5>] ip6_rcv_finish+0x27/0x2b [ipv6]
[ 339.730086] [<ffffffffa0292938>] ipv6_rcv+0x33f/0x37f [ipv6]
[ 339.730086] [<ffffffff813c416a>] netif_receive_skb+0x3b0/0x3da
[ 339.730086] [<ffffffffa02d2b6e>] br_handle_frame_finish+0x104/0x13c
[bridge]
[ 339.730086] [<ffffffffa02d2d37>] br_handle_frame+0x191/0x1aa
[bridge]
[ 339.730086] [<ffffffff813c40d6>] netif_receive_skb+0x31c/0x3da
[ 339.730086] [<ffffffff8106e6ee>] ? __run_hrtimer+0xec/0x11b
[ 339.730086] [<ffffffff813c4230>] process_backlog+0x9c/0xd8
[ 339.730086] [<ffffffff813c489b>] net_rx_action+0x7d/0x19c
[ 339.730086] [<ffffffff81055d08>] __do_softirq+0xf8/0x1cd
[ 339.730086] [<ffffffff8100ab9c>] call_softirq+0x1c/0x30
[ 339.730086] <EOI>
[ 339.730086] [<ffffffff8100c38f>] ? do_softirq+0x4b/0xa3
[ 339.730086] [<ffffffff813c49e0>] netif_rx_ni+0x26/0x2b
[ 339.730086] [<ffffffffa0449d6d>] tun_chr_aio_write+0x3ce/0x429 [tun]
[ 339.730086] [<ffffffffa044999f>] ? tun_chr_aio_write+0x0/0x429 [tun]
[ 339.730086] [<ffffffff81118fd2>] do_sync_readv_writev+0xc1/0x100
[ 339.730086] [<ffffffff811e9ee1>] ? selinux_file_permission+0xa7/0xb3
[ 339.730086] [<ffffffff81118d36>] ? copy_from_user+0x2f/0x31
[ 339.730086] [<ffffffff811e02f5>] ? security_file_permission
+0x16/0x18
[ 339.730086] [<ffffffff81119d1f>] do_readv_writev+0xa7/0x127
[ 339.730086] [<ffffffff8111a1d7>] ? rcu_read_unlock+0x21/0x23
[ 339.730086] [<ffffffff8111a2bd>] ? fget_light+0xe4/0xf2
[ 339.730086] [<ffffffff8111a230>] ? fget_light+0x57/0xf2
[ 339.730086] [<ffffffff81119de2>] vfs_writev+0x43/0x4e
[ 339.730086] [<ffffffff81119ed2>] sys_writev+0x4a/0x93
[ 339.730086] [<ffffffff81009c32>] system_call_fastpath+0x16/0x1b
[ 339.730086] Code: 38 0f b6 d2 48 01 d0 74 30 48 8b 58 28 eb 13 48 89
df e8 fc f8 ff ff 48 89 df e8 c8 f8 ff ff 4c 89 e3 48 85 db 74 12 48 8d
7b 78 <4c> 8b 23 e8 e2 73 c7 ff 85 c0 74 e8 eb d6 5b 41 5c c9 c3 55 48
[ 339.730086] RIP [<ffffffff813e5f3e>] nf_ct_remove_expectations
+0x49/0x5c
[ 339.730086] RSP <ffff880030603c18>
[ 340.095082] ---[ end trace 7be6d51534d25d87 ]---
[ 340.099728] Kernel panic - not syncing: Fatal exception in interrupt
[ 340.106123] Pid: 3245, comm: qemu-kvm Tainted: G D 2.6.33-rc5
#3
[ 340.112853] Call Trace:
[ 340.112860] <IRQ> [<ffffffff81453b1d>] panic+0x7a/0x142
[ 340.112886] [<ffffffff81458011>] oops_end+0xb7/0xc7
[ 340.112898] [<ffffffff8100d421>] die+0x5a/0x63
[ 340.112909] [<ffffffff81457bcc>] do_general_protection+0x12a/0x133
[ 340.112922] [<ffffffff81457100>] ? irq_return+0x0/0x10
[ 340.112956] [<ffffffffa02925c1>] ? ip6_mc_input+0x214/0x221 [ipv6]
[ 340.112969] [<ffffffff81457345>] general_protection+0x25/0x30
[ 340.113002] [<ffffffffa02925c1>] ? ip6_mc_input+0x214/0x221 [ipv6]
Jon.
Attachments
- kvm_crash_20100128.txt [text/plain] 80210 bytes · preview