Re: [PATCH 1/1] NFSD: fix WARN_ON_ONCE in __queue_delayed_work
From: Mike Galbraith <hidden>
Date: 2023-01-11 11:21:42
On Wed, 2023-01-11 at 05:55 -0500, Jeff Layton wrote:
quoted
quoted
crash> delayed_work ffff8881601fab48 struct delayed_work { work = { data = { counter = 1 }, entry = { next = 0x0, prev = 0x0 }, func = 0x0 }, timer = { entry = { next = 0x0, pprev = 0x0 }, expires = 0, function = 0x0, flags = 0 }, wq = 0x0, cpu = 0 }That looks more like a memory scribble or UAF. Merely having multiple tasks calling queue_work at the same time wouldn't be enough to trigger this, IMO. It's more likely that the extra locking is changing the timing of your reproducer somehow. It might be interesting to turn up KASAN if you're able.
I can try that.
If you still have this vmcore, it might be interesting to do the pointer math and find the nfsd_net structure that contains the above delayed_work. Does the rest of it also seem to be corrupt? My guess is that the corrupted structure extends beyond just the delayed_work above. Also, it might be helpful to do this: kmem -s ffff8881601fab48 ...which should tell us whether and what part of the slab this object is now a part of. That said, net-namespace object allocations are somewhat weird, and I'm not 100% sure they come out of the slab.
I tossed the vmcore, but can generate another. I had done kmem sans -s
previously, still have that.
crash> kmem ffff8881601fab48
CACHE OBJSIZE ALLOCATED TOTAL SLABS SSIZE NAME
kmem: kmalloc-1k: partial list slab: ffffea0005b20c08 invalid page.inuse: -1
ffff888100041840 1024 2329 2432 76 32k kmalloc-1k
SLAB MEMORY NODE TOTAL ALLOCATED FREE
ffffea0005807e00 ffff8881601f8000 0 32 32 0
FREE / [ALLOCATED]
[ffff8881601fa800]
PAGE PHYSICAL MAPPING INDEX CNT FLAGS
ffffea0005807e80 1601fa000 dead000000000400 0 0 200000000000000
crash