Re: [PATCH] improved locking performance in rt_run_flush()

From: David Miller <davem@davemloft.net>
Date: 2007-05-31 23:26:23
Also in: lkml

From: Herbert Xu <herbert@gondor.apana.org.au>
Date: Sun, 20 May 2007 15:11:48 +1000

David Miller [off-list ref] wrote:

quoted

From: Dave Johnson <redacted>

quoted

The below patch changes rt_run_flush() to only take each spinlock
protecting the rt_hash_table once instead of taking a spinlock for
every hash table bucket (and ending up taking the same small set 
of locks over and over).

...

quoted

I'm not ignoring it I'm just trying to brainstorm whether there
is a better way to resolve this inefficiency. :-)

The main problem I see with this is having to walk and free each
chain with the lock held.  We could avoid this if we had a pointer
in struct rtable to chain them up for freeing later.

I just checked and struct rtable is 236 bytes long on 32-bit but
the slab cache pads it to 256 bytes so we've got some free space.
I suspect 64-bit should be similar.

SLUB I believe packs more aggressively and won't pad things out like
that.  Therefore adding a member to rtable is much less attractive.

I've been considering various alternative ways to deal with this.

For 2.6.22 and -stable's sake we could allocate an array of pointers
of size N where N is the number of rtable hash slots per spinlock.
A big lock wraps around rt_run_flush() to protect these slots, and
then the loop is:

	grap_lock();
	for_each_hash_chain_for_lock(i) {
		rth = rt_hash_table[i].chain;
		if (rth) {
			rt_hash_table[i].chain = NULL;
			flush_chain[i % N] = rt;
		}
	}
	drop_lock();

	for (i = 0; i < N; i++) {
		struct rtable *rth = flush_chain[i];
		flush_chain[i] = NULL;
		while (rth) {
			struct rtable *next = rth->u.dst.rt_next;
			rt_free(rth);
			rth = next;
		}
	}

Holding a lock across the entire hash plucking has it's not nice
properties, but it's better than taking the same lock N times in
a row.

In the longer term, if I resurrect my dynamically sized rtable hash
patches (which I do intend to do), that code protected a lot of this
stuff with a seqlock and it might be possible to use that seqlock
solely to flush the lists in rt_run_flush().

Any better ideas?

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help