Re: [PATCH v7 15/16] openvswitch: use new hashtable implementation
From: Mathieu Desnoyers <hidden>
Date: 2012-10-29 18:16:53
Also in:
dm-devel, linux-mm, linux-nfs, lkml
* Sasha Levin (levinsasha928-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org) wrote:
On Mon, Oct 29, 2012 at 11:59 AM, Mathieu Desnoyers [off-list ref] wrote:quoted
* Sasha Levin (levinsasha928-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org) wrote:quoted
Hi Mathieu, On Mon, Oct 29, 2012 at 9:29 AM, Mathieu Desnoyers [off-list ref] wrote:quoted
* Sasha Levin (levinsasha928-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org) wrote: [...]quoted
-static struct hlist_head *hash_bucket(struct net *net, const char *name) -{ - unsigned int hash = jhash(name, strlen(name), (unsigned long) net); - return &dev_table[hash & (VPORT_HASH_BUCKETS - 1)]; -} - /** * ovs_vport_locate - find a port that has already been created *@@ -84,13 +76,12 @@ static struct hlist_head *hash_bucket(struct net *net, const char *name) */ struct vport *ovs_vport_locate(struct net *net, const char *name) { - struct hlist_head *bucket = hash_bucket(net, name); struct vport *vport; struct hlist_node *node; + int key = full_name_hash(name, strlen(name)); - hlist_for_each_entry_rcu(vport, node, bucket, hash_node) - if (!strcmp(name, vport->ops->get_name(vport)) && - net_eq(ovs_dp_get_net(vport->dp), net)) + hash_for_each_possible_rcu(dev_table, vport, node, hash_node, key)Is applying hash_32() on top of full_name_hash() needed and expected ?Since this was pointed out in several of the patches, I'll answer it just once here. I've intentionally "allowed" double hashing with hash_32 to keep the code simple. hash_32() is pretty simple and gcc optimizes it to be almost nothing, so doing that costs us a multiplication and a shift. On the other hand, we benefit from keeping our code simple - how would we avoid doing this double hash? adding a different hashtable function for strings? or a new function for already hashed keys? I think we benefit a lot from having to mul/shr instead of adding extra lines of code here.This could be done, as I pointed out in another email within this thread, by changing the "key" argument from add/for_each_possible to an expected "hash" value, and let the caller invoke hash_32() if they want. I doubt this would add a significant amount of complexity for users of this API, but would allow much more flexibility to choose hash functions.Most callers do need to do the hashing though, so why add an additional step for all callers instead of doing another hash_32 for the ones that don't really need it? Another question is why do you need flexibility? I think that simplicity wins over flexibility here.
I usually try to make things as simple as possible, but not simplistic compared to the problem tackled. In this case, I would ask the following question: by standardizing the hash function of all those pieces of kernel infrastructure to "hash_32()", including submodules part of the kernel network infrastructure, parts of the kernel that can be fed values coming from user-space (through the VFS), how can you guarantee that hash_32() won't be the cause of a DoS attack based on the fact that this algorithm is a) known by an attacker, and b) does not have any randomness. It's been a recent trend to perform DoS attacks on poorly implemented hashing functions. This is just one example in an attempt to show why different hash table users may have different constraints: for a hash table entirely populated by keys generated internally by the kernel, a random seed might not be required, but for cases where values are fed by user-space and from the NIC, I would argue that flexibility to implement a randomizable hash function beats implementation simplicity any time. And you could keep the basic use-case simple by providing hints to the hash_32()/hash_64()/hash_ulong() helpers in comments. Thoughts ? Thanks, Mathieu -- Mathieu Desnoyers Operating System Efficiency R&D Consultant EfficiOS Inc. http://www.efficios.com