Thread (18 messages) 18 messages, 6 authors, 2009-05-01

Re: [PATCH mmotm] mm: alloc_large_system_hash check order

From: Hugh Dickins <hidden>
Date: 2009-05-01 13:42:39
Also in: linux-mm, lkml

On Wed, 29 Apr 2009, Andrew Morton wrote:
yes, the code is a bit odd:

:	do {
: 		size = bucketsize << log2qty;
: 		if (flags & HASH_EARLY)
: 			table = alloc_bootmem_nopanic(size);
: 		else if (hashdist)
: 			table = __vmalloc(size, GFP_ATOMIC, PAGE_KERNEL);
: 		else {
: 			unsigned long order = get_order(size);
: 			table = (void*) __get_free_pages(GFP_ATOMIC, order);
: 			/*
: 			 * If bucketsize is not a power-of-two, we may free
: 			 * some pages at the end of hash table.
: 			 */
: 			if (table) {
: 				unsigned long alloc_end = (unsigned long)table +
: 						(PAGE_SIZE << order);
: 				unsigned long used = (unsigned long)table +
: 						PAGE_ALIGN(size);
: 				split_page(virt_to_page(table), order);
: 				while (used < alloc_end) {
: 					free_page(used);
: 					used += PAGE_SIZE;
: 				}
: 			}
: 		}
: 	} while (!table && size > PAGE_SIZE && --log2qty);

In the case where it does the __vmalloc(), the order-11 allocation will
succeed.  But in the other cases, the allocation attempt will need to
be shrunk and we end up with a smaller hash table.  Is that sensible?
It is a little odd, but the __vmalloc() route is used by default on
64-bit with CONFIG_NUMA, and this route otherwise.  (The hashdist
Doc isn't up-to-date on that, I'll send a patch.)
If we want to regularise all three cases, doing

	size = min(size, MAX_ORDER);
If I take you literally, the resulting hash tables are going to
be rather small ;) but I know what you mean.
before starting the loop would be suitable, although the huge
__get_free_pages() might still fail.
Oh, I don't feel a great urge to regularize these cases in such
a way.  I particularly don't feel like limiting 64-bit NUMA to
MAX_ORDER-1 size, if netdev have been happy with more until now.
Could consider a __vmalloc fallback when order is too large,
but let's not do so unless someone actually needs that.
(But it will then warn, won't it?
 And nobody is reporting that).
Well, it was hard to report it while mmotm's WARN_ON_ONCE was itself
oopsing.  With that fixed, I've reported it on x86_64 with 4GB
(without CONFIG_NUMA).
I was a bit iffy about adding the warning in the first place, let it go
through due to its potential to lead us to code which isn't doing what
it thinks it's doing, or is being generally peculiar.
DaveM has confirmed that the code is doing what they want it to do.
So I think mmotm wants this patch (for alloc_large_system_hash to
keep away from that warning), plus Mel's improvement on top of it.

Hugh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help