Re: RFC: Established connections hash function

From: Nikolaos D. Bougalis <hidden>
Date: 2007-03-22 19:45:16

On Thu, Mar 22, 2007 11:21 AM, Evgeniy Polyakov [off-list ref] wrote:

quoted

   Utterly broken? Nonsense. I have tested the actual function I proposed
(sans the __force and __u32 stuff, which weren't necessary in my test
program), against real data, collected from various servers in real-time.
It has consistently achieved lower average chain lengths than the vanilla
function and demonstrated no artifacting, and that's trivial to verify.

So what?

    So what? Are you serious?

People test and work with XOR hash for years and they do not strike any
problems. If we talk about specially crafted data, then XOR one is no
worse than Jenkins with 3 words (which is even worse for blind attack of
constant ports).

    People _have_ had problems. _I_ have had problems. And when someone with 
a few thousand drones under his control hoses your servers because he can do 
math and he leaves you with 20000-item long chains, _you_ will have 
problems. And sticking your head in the sand and saying "people work with 
XOR hash for years and they do not strike any problems" wont help you.

quoted

   The only analysis I could find was this
http://tservice.net.ru/~s0mbre/blog/2006/05/14#2006_05_14, which uses
jhash_2words, and not jhash_3words, and which naively attempts to take 
the
output of jhash_2words, and to perform the same mixing trick that the
vanilla inet_ehashfn does and uses artificially generated data sets.

It is outdated, check recent netdev@ archives. Folding used in that test
does not change distribution, and data was presented as it can be
selected by attacker, who can create with any distribution.

    Be careful here. If the folding makes no difference, it says something 
very important about __jhash_mix, and that something goes against the very 
thing that you are saying.

quoted

   But please, feel free to point out any other _unfavorable_ analyses of
jhash_2words or jhash_3words that I may have missed.

quoted

We can use jhash_2words(laddr, faddr, portpair^inet_ehash_rnd) though.

   Please explain to me how jhash_2words solves the issue that you claim
jhash_3words has, when they both use the same underlying bit-mixer?

$c value is not properly distributed and significanly breaks overall
distribution. Attacker, which controls $c (and it does it by controlling
ports), can significantly increase selected hash chains.

    I've tested the Jenkins hash extensively. I see no evidence of this 
"improper distribution" that you describe. In fact, about the only person 
that I've seen advocate this in the archives of netdev is you, and a lot of 
other very smart people disagree with you, so I consider myself to be in 
good company.

But it is only $c, $a and $b are properly distributed, so jhash_2words()
is safer than jhash_3words().
Just create a simple application which does
jhash_3words(a, b, rand(), init) and jhash_2words(a, b, rand()) and see
results.

    What exactly am I supposed to see in these results? Because whatever it 
is, it's not there. Feel free to provide a link to your data and a histogram 
that shows what you find of interest though, and I'll be happy to look at 
it.

    -n

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help