Re: [PATCH v2 7/9] NFSD: Use rhashtable for managing nfs4_file objects

From: Chuck Lever III <chuck.lever@oracle.com>
Date: 2022-10-12 15:02:10

On Oct 11, 2022, at 7:37 PM, NeilBrown [off-list ref] wrote:

On Tue, 11 Oct 2022, Chuck Lever III wrote:

quoted

On Oct 10, 2022, at 8:16 PM, NeilBrown [off-list ref] wrote:

On Fri, 07 Oct 2022, Chuck Lever wrote:

quoted

-static unsigned int file_hashval(struct svc_fh *fh)
+/*
+ * The returned hash value is based solely on the address of an in-code
+ * inode, a pointer to a slab-allocated object. The entropy in such a
+ * pointer is concentrated in its middle bits.

I think you need more justification than that for throwing away some of
the entropy, even if you don't think it is much.

We might have that justification:

https://lore.kernel.org/linux-nfs/YrUFbLJ5uVbWtZbf@ZenIV/ (local)

Actually I believe we are not discarding /any/ entropy in
this function. The bits we discard are invariant.

Ok, I can accept that this:

+	k = ptr >> L1_CACHE_SHIFT;

I searched for ">> *L1_CACHE_SHIFT".  Apart from the nfsd
filecache you mentioned I find two.  One in quota and one in reiserfs.
Both work with traditional hash tables which are more forgiving of
longer chains.
Do you have other evidence of this being a common trope?


This approach is based on the hash function in fs/inode.c,
which uses integer division instead of a shift.

 509 static unsigned long hash(struct super_block *sb, unsigned long hashval)
 510 {
 511         unsigned long tmp;
 512 
 513         tmp = (hashval * (unsigned long)sb) ^ (GOLDEN_RATIO_PRIME + hashval) /
 514                         L1_CACHE_BYTES;
 515         tmp = tmp ^ ((tmp ^ GOLDEN_RATIO_PRIME) >> i_hash_shift);
 516         return tmp & i_hash_mask;
 517 }

only discards invariant bits, but how can you justify this:

+	k &= 0x00ffffff;

??

After shifting an address, the top byte generally contains
invariant bits as well.

And given that you pass it all to jhash anyway, why not just pass all of
it?

I don't think this is a big deal, but these functions are
basically the same as what was recently merged without
complaint. It's not a high priority to revisit those.

It might be worth a clean-up to share this hash function
between the two hash tables... at that point we might
consider removing the extra mask.

quoted

And, note that this is exactly the same situation we just merged
in the filecache overhaul, and is a common trope amongst other
hash tables that base their function on the inode's address.

quoted

Presumably you think hashing 32 bits is faster than hashing 64 bits.
Maybe it is, but is it worth it?

rhashtable depends heavily on having a strong hash function.  In
particular if any bucket ends up with more than 16 elements it will
choose a new seed and rehash.  If you deliberately remove some bits that
it might have been used to spread those 16 out, then you are asking for
trouble.

We know that two different file handles can refer to the same inode
("rarely"), and you deliberately place them in the same hash bucket.
So if an attacker arranges to access 17 files with the same inode but
different file handles, then the hashtable will be continuously
rehashed.

The preferred approach when you require things to share a hash chain is
to use an rhl table.

Again, this is the same situation for the filecache. Do you
believe it is worth reworking that? I'm guessing "yes".

As a matter of principle: yes.
rhashtable is designed to assume that hash collisions are bad and can be
changed by choosing a different seed.
rhl_tables was explicitly added for cases when people wanted multiple
elements to hash to the same value.

The chance of it causing a problem without an attack are admittedly
tiny.  Attacks are only possible with subtree_check enabled, or if the
underlying filesystem does something "clever" with file handles, so
there wouldn't be many situations where an attack would even be
possible.  But if it were possible, it would likely be easy.
The cost of the attack would be a minor-to-modest performance impact.

So it is hard to argue "this code is dangerous and must be changed", but
we have different tools for a reason, and I believe that rhl-tables is
the right tool for this job.

Agreed. I wasn't suggesting it's an emergency situation, but it's
something that should get fixed at some point if there's a problem
with it, even a minor one.

I think I stopped at the non-list variant of rhashtable because
using rhl was more complex, and the non-list variant seemed to
work fine. There's no architectural reason either file_hashtbl
or the filecache must use the non-list variant.

In any event, it's worth taking the trouble now to change the
nfs4_file implementation proposed here as you suggest.


--
Chuck Lever

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help