Thread (27 messages) 27 messages, 4 authors, 2023-01-11

Re: [PATCH 1/1] NFSD: fix WARN_ON_ONCE in __queue_delayed_work

From: Jeff Layton <jlayton@kernel.org>
Date: 2023-01-11 10:56:11

On Wed, 2023-01-11 at 05:15 -0500, Jeff Layton wrote:
On Wed, 2023-01-11 at 03:34 +0100, Mike Galbraith wrote:
quoted
On Tue, 2023-01-10 at 11:58 -0800, dai.ngo@oracle.com wrote:
quoted
On 1/10/23 11:30 AM, Jeff Layton wrote:
quoted
quoted
Looking over the traces that Mike posted, I suspect this is the real
bug, particularly if the server is being restarted during this test.
Yes, I noticed the WARN_ON_ONCE(timer->function != delayed_work_timer_fn)
too and this seems to indicate some kind of corruption. However, I'm not
sure if Mike's test restarts the nfs-server service. This could be a bug
in work queue module when it's under stress.
My reproducer was to merely mount and traverse/md5sum, while that was
going on, fire up LTP's min_free_kbytes testcase (memory hog from hell)
on the server.  Systemthing may well be restarting the server service
in response to oomkill.  In fact, the struct delayed_work in question
at WARN_ON_ONCE() time didn't look the least bit ready for business.

FWIW, I had noticed the missing cancel while eyeballing, and stuck one
next to the existing one as a hail-mary, but that helped not at all.
Ok, thanks, that's good to know.

I still doubt that the problem is the race that Dai seems to think it
is. The workqueue infrastructure has been fairly stable for years. If
there were problems with concurrent tasks queueing the same work, the
kernel would be blowing up all over the place.
quoted
crash> delayed_work ffff8881601fab48
struct delayed_work {
  work = {
    data = {
      counter = 1
    },
    entry = {
      next = 0x0,
      prev = 0x0
    },
    func = 0x0
  },
  timer = {
    entry = {
      next = 0x0,
      pprev = 0x0
    },
    expires = 0,
    function = 0x0,
    flags = 0
  },
  wq = 0x0,
  cpu = 0
}
That looks more like a memory scribble or UAF. Merely having multiple
tasks calling queue_work at the same time wouldn't be enough to trigger
this, IMO. It's more likely that the extra locking is changing the
timing of your reproducer somehow.

It might be interesting to turn up KASAN if you're able. 
If you still have this vmcore, it might be interesting to do the pointer
math and find the nfsd_net structure that contains the above
delayed_work. Does the rest of it also seem to be corrupt? My guess is
that the corrupted structure extends beyond just the delayed_work above.

Also, it might be helpful to do this:

     kmem -s ffff8881601fab48

...which should tell us whether and what part of the slab this object is
now a part of. That said, net-namespace object allocations are somewhat
weird, and I'm not 100% sure they come out of the slab.
-- 
Jeff Layton [off-list ref]
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help