Re: [RFC v4 Patch 0/4] fs/inode.c: optimization for inode lock usage
From: Guo Chao <hidden>
Date: 2012-09-27 08:42:01
On Wed, Sep 26, 2012 at 10:54:09AM +1000, Dave Chinner wrote:
On Tue, Sep 25, 2012 at 04:59:55PM +0800, Guo Chao wrote:quoted
On Mon, Sep 24, 2012 at 06:26:54PM +1000, Dave Chinner wrote:quoted
@@ -783,14 +783,19 @@ static void __wait_on_freeing_inode(struct inode *inode); static struct inode *find_inode(struct super_block *sb, struct hlist_head *head, int (*test)(struct inode *, void *), - void *data) + void *data, bool locked) { struct hlist_node *node; struct inode *inode = NULL; repeat: - hlist_for_each_entry(inode, node, head, i_hash) { + rcu_read_lock(); + hlist_for_each_entry_rcu(inode, node, head, i_hash) { spin_lock(&inode->i_lock); + if (inode_unhashed(inode)) { + spin_unlock(&inode->i_lock); + continue; + }Is this check too early? If the unhashed inode happened to be the target inode, we are wasting our time to continue the traversal and we do not wait on it.If the inode is unhashed, then it is already passing through evict() or has already passed through. If it has already passed through evict() then it is too late to call __wait_on_freeing_inode() as the wakeup occurs in evict() immediately after the inode is removed from the hash. i.e: remove_inode_hash(inode); spin_lock(&inode->i_lock); wake_up_bit(&inode->i_state, __I_NEW); BUG_ON(inode->i_state != (I_FREEING | I_CLEAR)); spin_unlock(&inode->i_lock); i.e. if we get the case: Thread 1, RCU hash traversal Thread 2, evicting foo rcu_read_lock() found inode foo remove_inode_hash(inode); spin_lock(&foo->i_lock); wake_up(I_NEW) spin_unlock(&foo->i_lock); destroy_inode() ...... spin_lock(foo->i_lock) match sb, ino I_FREEING rcu_read_unlock() <rcu grace period can expire at any time now, so use after free is guaranteed at some point> wait_on_freeing_inode wait_on_bit(I_NEW) <will never get woken> Hence if the inode is unhashed, it doesn't matter what inode it is, it is never valid to use it any further because it may have already been freed and the only reason we can safely access here it is that the RCU grace period will not expire until we call rcu_read_unlock().
Yeah, looks right.
quoted
quoted
@@ -1078,8 +1098,7 @@ struct inode *iget_locked(struct super_block *sb, unsigned long ino) struct inode *old; spin_lock(&inode_hash_lock); - /* We released the lock, so.. */ - old = find_inode_fast(sb, head, ino); + old = find_inode_fast(sb, head, ino, true); if (!old) { inode->i_ino = ino; spin_lock(&inode->i_lock);Emmmm ... couldn't we use memory barrier API instead of irrelevant spin lock on newly allocated inode to publish I_NEW?Yes, we could. However, having multiple synchronisation methods for a single variable that should only be used in certain circumstances is something that is easy to misunderstand and get wrong. Memory barriers are much more subtle and harder to understand than spin locks, and every memory barrier needs to be commented to explain what the barrier is actually protecting against. In the case where a spin lock is guaranteed to be uncontended and the cache line hot in the CPU cache, it makes no sense to replace the spin lock with a memory barrier, especially when every other place we modify the i_state/i_hash fields we have to wrap them with i_lock.... Simple code is good code - save the complexity for something that needs it.
Emmm, I doubt "it's simpler and need no document". I bet someday there will be other guys stand out and ask "why take spin lock on a inode which apparently does not subject to any race condition?".
I know that the per-sb inode lru lock is currently the hotest of the inode cache locks (performance limiting at somewhere in the range of 8-16way workloads on XFS), and I've got work in (slow) progress to address that. That work will also the address the per-sb dentry LRU locks, which are the hotest dentry cache locks as well.
Glad to hear that. Thank your for all your explanation, especially historical ones. Regards, Guo Chao