Re: lockdep recursive locking warning on for-next

From: "Darrick J. Wong" <djwong@kernel.org>
Date: 2021-02-19 04:29:18

On Fri, Feb 19, 2021 at 03:14:27PM +1100, Dave Chinner wrote:

On Thu, Feb 18, 2021 at 11:31:54AM -0800, Darrick J. Wong wrote:

quoted

On Thu, Feb 18, 2021 at 02:12:52PM -0500, Brian Foster wrote:

quoted

On Thu, Feb 18, 2021 at 10:49:26AM -0800, Darrick J. Wong wrote:

quoted

On Thu, Feb 18, 2021 at 01:14:50PM -0500, Brian Foster wrote:

quoted

Hi Darrick,

I'm seeing the warning below via xfs/167 on a test machine. It looks
like it's just complaining about nested freeze protection between the
scan invocation and an underlying transaction allocation for an inode
eofblocks trim. I suppose we could either refactor xfs_trans_alloc() to
drop and reacquire freeze protection around the scan, or alternatively
call __sb_writers_release() and __sb_writers_acquire() around the scan
to retain freeze protection and quiet lockdep. Hm?

Erk, isn't that a potential log grant livelock too?

Fill up the filesystem with real data and cow blocks until it's full,
then spawn exactly enough file writer threads to eat up all the log
reservation, then each _reserve() fails, so every thread starts a scan
and tries to allocate /another/ transaction ... but there's no space
left in the log, so those scans just block indefinitely.

So... I think the solution here is to go back to a previous version of
what that patchset did, where we'd drop the whole transaction, run the
scan, and jump back to the top of the function to get a fresh
transaction.

But we don't call into the scan while holding log reservation. We hold
the transaction memory and freeze protection. It's probably debatable
whether we'd want to scan with freeze protection held or not, but I
don't see how dropping either of those changes anything wrt to log
reservation..?

Right, sorry about the noise.  We could just trick lockdep with
__sb_writers_release like you said.  Though I am a tad bit concerned
about the rwsem behavior -- what happens if:

T1 calls sb_start_intwrite (which is down_read on sb_writers), gets the
lock, and then hits ENOSPC and goes into our scan loop; meanwhile,

T2 calls sb_wait_write (which is down_write on sb_writers), and is
scheduled off because it was a blocking lock attempt; and then,

T1 finds some eofblocks to delete, and now it wants to sb_start_intwrite
again as part of allocating that second nested transaction.  Does that
actually work, or will T1 stall because we don't allow more readers once
something is waiting in down_write()?

The stack trace nesting inside xfs_trans_alloc() looks fundamentally
wrong to me. It screams "warning, dragons be here" to me. We're not
allowed to nest transactions -anywhere- so actually designing a call
path that ends up looking like we are nesting transactions but then
plays whacky games to avoid problems associated with nesting
seems... poorly thought out.

The original version of the code[1] in question cancelled the
transaction before starting the scan, but the reviewers didn't think it
would be a problem to hold on to the transaction or recurse intwrite.

[1] https://lore.kernel.org/linux-xfs/20210124094816.GE670331@infradead.org/ (local)

I probably should've just kept it the way it was.  Well, maybe I'll send
in my own fixpatch and see what the rest of you think.

quoted

BTW, the stack report also had me wondering whether we had or need any
nesting protection in these new scan invocations. For example, if we
have an fs with a bunch of tagged inodes and concurrent allocation
activity, would anything prevent an in-scan transaction allocation from
jumping back into the scan code to complete outstanding work? It looks
like that might not be possible right now because neither scan reserves
blocks, but they do both use transactions and that's quite a subtle
balance..

Yes, that's a subtlety that screams for better documentation.

TBH, I'm not sure that's enough. I think we should at least have some
kind of warning, even if only in DEBUG mode, that explicitly calls out
if we've become susceptible to this kind of scan reentry. Otherwise I
suspect that if this problem is ever truly introduced, the person who
first discovers it will probably be user with a blown stack. :( Could we
set a flag on the task or something that warns as such (i.e. "WARNING:
attempted block reservation in block reclaim context") or perhaps just
prevents scan reentry in the first place?

What if we implemented a XFS_TRANS_TRYRESERVE flag that would skip the
scanning loops?  Then it would be at least a little more obvious when
xfs_free_eofblocks and xfs_reflink_cancel_cow_range kick on.

Isn't detecting transaction reentry exactly what PF_FSTRANS is for?

Or have we dropped that regression fix on the ground *again*?

That series isn't making progress.

--D

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help