Re: lockdep recursive locking warning on for-next
From: "Darrick J. Wong" <djwong@kernel.org>
Date: 2021-02-19 04:29:18
On Fri, Feb 19, 2021 at 03:14:27PM +1100, Dave Chinner wrote:
On Thu, Feb 18, 2021 at 11:31:54AM -0800, Darrick J. Wong wrote:quoted
On Thu, Feb 18, 2021 at 02:12:52PM -0500, Brian Foster wrote:quoted
On Thu, Feb 18, 2021 at 10:49:26AM -0800, Darrick J. Wong wrote:quoted
On Thu, Feb 18, 2021 at 01:14:50PM -0500, Brian Foster wrote:quoted
Hi Darrick, I'm seeing the warning below via xfs/167 on a test machine. It looks like it's just complaining about nested freeze protection between the scan invocation and an underlying transaction allocation for an inode eofblocks trim. I suppose we could either refactor xfs_trans_alloc() to drop and reacquire freeze protection around the scan, or alternatively call __sb_writers_release() and __sb_writers_acquire() around the scan to retain freeze protection and quiet lockdep. Hm?Erk, isn't that a potential log grant livelock too? Fill up the filesystem with real data and cow blocks until it's full, then spawn exactly enough file writer threads to eat up all the log reservation, then each _reserve() fails, so every thread starts a scan and tries to allocate /another/ transaction ... but there's no space left in the log, so those scans just block indefinitely. So... I think the solution here is to go back to a previous version of what that patchset did, where we'd drop the whole transaction, run the scan, and jump back to the top of the function to get a fresh transaction.But we don't call into the scan while holding log reservation. We hold the transaction memory and freeze protection. It's probably debatable whether we'd want to scan with freeze protection held or not, but I don't see how dropping either of those changes anything wrt to log reservation..?Right, sorry about the noise. We could just trick lockdep with __sb_writers_release like you said. Though I am a tad bit concerned about the rwsem behavior -- what happens if: T1 calls sb_start_intwrite (which is down_read on sb_writers), gets the lock, and then hits ENOSPC and goes into our scan loop; meanwhile, T2 calls sb_wait_write (which is down_write on sb_writers), and is scheduled off because it was a blocking lock attempt; and then, T1 finds some eofblocks to delete, and now it wants to sb_start_intwrite again as part of allocating that second nested transaction. Does that actually work, or will T1 stall because we don't allow more readers once something is waiting in down_write()?The stack trace nesting inside xfs_trans_alloc() looks fundamentally wrong to me. It screams "warning, dragons be here" to me. We're not allowed to nest transactions -anywhere- so actually designing a call path that ends up looking like we are nesting transactions but then plays whacky games to avoid problems associated with nesting seems... poorly thought out.
The original version of the code[1] in question cancelled the transaction before starting the scan, but the reviewers didn't think it would be a problem to hold on to the transaction or recurse intwrite. [1] https://lore.kernel.org/linux-xfs/20210124094816.GE670331@infradead.org/ (local) I probably should've just kept it the way it was. Well, maybe I'll send in my own fixpatch and see what the rest of you think.
quoted
quoted
quoted
quoted
BTW, the stack report also had me wondering whether we had or need any nesting protection in these new scan invocations. For example, if we have an fs with a bunch of tagged inodes and concurrent allocation activity, would anything prevent an in-scan transaction allocation from jumping back into the scan code to complete outstanding work? It looks like that might not be possible right now because neither scan reserves blocks, but they do both use transactions and that's quite a subtle balance..Yes, that's a subtlety that screams for better documentation.TBH, I'm not sure that's enough. I think we should at least have some kind of warning, even if only in DEBUG mode, that explicitly calls out if we've become susceptible to this kind of scan reentry. Otherwise I suspect that if this problem is ever truly introduced, the person who first discovers it will probably be user with a blown stack. :( Could we set a flag on the task or something that warns as such (i.e. "WARNING: attempted block reservation in block reclaim context") or perhaps just prevents scan reentry in the first place?What if we implemented a XFS_TRANS_TRYRESERVE flag that would skip the scanning loops? Then it would be at least a little more obvious when xfs_free_eofblocks and xfs_reflink_cancel_cow_range kick on.Isn't detecting transaction reentry exactly what PF_FSTRANS is for? Or have we dropped that regression fix on the ground *again*?
That series isn't making progress. --D
Cheers, Dave. -- Dave Chinner david@fromorbit.com