Re: [PATCH 10/11] list_bl: don't use bit locks for PREEMPT_RT or lockdep
From: Dave Chinner <david@fromorbit.com>
Date: 2023-12-07 04:41:59
Also in:
dm-devel, gfs2, linux-block, linux-fsdevel, lkml, selinux
On Wed, Dec 06, 2023 at 11:16:50PM -0500, Kent Overstreet wrote:
On Wed, Dec 06, 2023 at 05:05:39PM +1100, Dave Chinner wrote:quoted
From: Dave Chinner <redacted> hash-bl nests spinlocks inside the bit locks. This causes problems for CONFIG_PREEMPT_RT which converts spin locks to sleeping locks, and we're not allowed to sleep while holding a spinning lock. Further, lockdep does not support bit locks, so we lose lockdep coverage of the inode hash table with the hash-bl conversion. To enable these configs to work, add an external per-chain spinlock to the hlist_bl_head() and add helpers to use this instead of the bit spinlock when preempt_rt or lockdep are enabled. This converts all users of hlist-bl to use the external spinlock in these situations, so we also gain lockdep coverage of things like the dentry cache hash table with this change. Signed-off-by: Dave Chinner <redacted>Sleepable bit locks can be done with wait_on_bit(), is that worth considering for PREEMPT_RT? Or are the other features of real locks important there?
I think wait_on_bit() is not scalable. It hashes down to one of 256 shared struct wait_queue_heads which have thundering herd behaviours, and it requires the locker to always run prepare_to_wait() and finish_wait(). This means there is at least one spinlock_irqsave()/unlock pair needed, sometimes two, just to get an uncontended sleeping bit lock. So as a fast path operation that requires lock scalability, it's going to be better to use a straight spinlock that doesn't require irq safety as it's far less expensive than a sleeping bit lock. Whether CONFIG_PREEMPT_RT changes that equation at all is not at all clear to me, and so I'll leave that consideration to RT people if they see a need to address it. In the mean time, we need to use an external spinlock for lockdep validation so it really doesn't make any sense at all to add a third locking variant with completely different semantics just for PREEMPT_RT... -Dave. -- Dave Chinner david@fromorbit.com