Thread (39 messages) 39 messages, 8 authors, 2008-08-28

Re: cat /proc/net/tcp takes 0.5 seconds on x86_64

From: Eric Dumazet <hidden>
Date: 2008-08-28 06:21:05

David Miller a écrit :
From: Stephen Hemminger <redacted>
Date: Wed, 27 Aug 2008 14:48:00 -0700
quoted
I do wonder if having large hash table actually helps? When TCP hash
table gets too big, it means every lookup is a cache miss. Assuming
a busy server with 2000 connections and perfect hash. On a 4G mem x86-64
we are doing 512K hash entries which is ridiculous. Something like 64K
entries is more than enough.
That's true, but it's nearly guaranteed to only be a single cache miss
at worst (if the hash function is working) compared to potentially
multiple ones if we sized it too small.

I really see the only way to move forward is to dynamically size the
thing.  And nobody has been strong enough to implement that yet :)
You are right. For TCP hash table thats probably hard to implement.

But for route cache, it is probably doable since we added the rt_genid
thing in commit 29e75252da20f3ab9e132c68c9aed156b87beae6 
([IPV4] route cache: Introduce rt_genid for smooth cache invalidation)

If we add a hash table for each "struct net" (net->ipv4.rt_hash_table),
we then could do something sensible when an admin writes to 
/proc/sys/net/ipv4/route/hash_size or at rt_check_expire() time, if
hash table is found to be full...

1) Instead of using alloc_large_system_hash() at boot time to allocate
   rt_hash_table, use a plain vmalloc()
Initial hash size could be small (one page) unless "rhash_entries=xxx" boot parameter says otherwise.

2) If an admin writes a new value to /proc/sys/net/ipv4/route/hash_size :
- Allocate a new table with vmalloc()
- Change the net->ipv4.rt_genid and net->ipv4.rt_hash_table
- Old table contains obsolete entries, rt_free() them all.
- vfree() old hash table, now empty.


3) In rt_check_expire(), adds some metrics to trigger an expand of the
  hash table in case we found too many entries in it.



Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help