Thread (10 messages) 10 messages, 2 authors, 2011-08-24

Re: [PATCH] ARM: sparsemem: Enable CONFIG_HOLES_IN_ZONE config option for SparseMem and HAS_HOLES_MEMORYMODEL for linux-3.0.

From: Mel Gorman <mgorman@suse.de>
Date: 2011-08-04 10:09:36
Also in: linux-arm-kernel

On Thu, Aug 04, 2011 at 03:06:39PM +0530, Kautuk Consul wrote:
Hi Mel,

My ARM system has 2 memory banks which have the following 2 PFN ranges:
60000-62000 and 70000-7ce00.

My SECTION_SIZE_BITS is #defined to 23.
So bank 0 is 4 sections and bank 1 is 26 sections with the last section
incomplete.
I am altering the ranges via the following kind of pseudo-code in the
arch/arm/mach-*/mach-*.c file:
meminfo->bank[0].size -= (1 << 20)
meminfo->bank[1].size -= (1 << 20)
Why are you taking 1M off each bank? I could understand aligning the
banks to a section size at least.

That said, there is an assumption that pages within a MAX_ORDER-aligned
block. If you are taking 1M off the end of bank 0, it is no longer
MAX_ORDER aligned. This is the assumption move_freepages_block()
is falling foul of. The problem can be avoided by ensuring that memmap
is valid within MAX_ORDER-aligned ranges.
After altering the size of both memory banks, the PFN ranges now
visible to the kernel are:
60000-61f00 and 70000-7cd00.

There is only one node and one ZONE_NORMAL zone on the system and this
zone accounts for both memory banks as CONFIG_SPARSEMEM
is enabled in the kernel.
Ok.
I put some printks in the move_freeblockpages() function, compiled the
kernel and then ran the following commands:
ifconfig eth0 107.109.39.102 up
mount /dev/sda1 /mnt   # Mounted the USB pen drive
mount -t nfs -o nolock 107.109.39.103:/home/kautuk nfsmnt   # NFS
mounted an NFS share
cp test_huge_file nfsmnt/

I got the following output:
#> cp test_huge nfsmnt/
------------------------------------------------------------move_freepages_block_start----------------------------------------
kc: The page=c068a000 start_pfn=7c400 start_page=c0686000
end_page=c068dfe0 end_pfn=7c7ff
page_zone(start_page)=c048107c page_zone(end_page)=c048107c
page_zonenum(end_page) = 0
------------------------------------------------------------move_freepages_block_end----------------------------------------
------------------------------------------------------------move_freepages_block_start----------------------------------------
kc: The page=c0652000 start_pfn=7a800 start_page=c064e000
end_page=c0655fe0 end_pfn=7abff
page_zone(start_page)=c048107c page_zone(end_page)=c048107c
page_zonenum(end_page) = 0
------------------------------------------------------------move_freepages_block_end----------------------------------------
------------------------------------------------------------move_freepages_block_start----------------------------------------
kc: The page=c065a000 start_pfn=7ac00 start_page=c0656000
end_page=c065dfe0 end_pfn=7afff
page_zone(start_page)=c048107c page_zone(end_page)=c048107c
page_zonenum(end_page) = 0
------------------------------------------------------------move_freepages_block_end----------------------------------------
------------------------------------------------------------move_freepages_block_start----------------------------------------
kc: The page=c0695000 start_pfn=7c800 start_page=c068e000
end_page=c0695fe0 end_pfn=7cbff
page_zone(start_page)=c048107c page_zone(end_page)=c048107c
page_zonenum(end_page) = 0
------------------------------------------------------------move_freepages_block_end----------------------------------------
------------------------------------------------------------move_freepages_block_start----------------------------------------
kc: The page=c04f7c00 start_pfn=61c00 start_page=c04f6000
end_page=c04fdfe0 end_pfn=61fff
page_zone(start_page)=c048107c page_zone(end_page)=c0481358
page_zonenum(end_page) = 1
------------------------------------------------------------move_freepages_block_end----------------------------------------
kernel BUG at mm/page_alloc.c:849!
Unable to handle kernel NULL pointer dereference at virtual address 00000000
pgd = ce9fc000
[00000000] *pgd=7ca5a031, *pte=00000000, *ppte=00000000

As per the last line, we can clearly see that the
page_zone(start_page)=c048107c and page_zone(end_page)=c0481358,
which are not equal to each other.
Since they do not match, the code in move_freepages() bugchecks
because of the following BUG_ON() check:
page_zone(start_page) != page_zone(end_page)
The reason for this that the page_zonenum(end_page) is equal to 1 and
this is different from the page_zonenum(start_page) which is 0.
Because the MAX_ORDER alignment is gone.
On checking the code within page_zonenum(), I see that this code tries
to retrieve the zone number from the end_page->flags.
Yes. In the majority of cases a pages node and zone is stored in the
page->flags.
The reason why we cannot expect the 0x61fff end_page->flags to contain
a valid zone number is:
memmap_init_zone() initializes the zone number of all pages for a zone
via the set_page_links() inline function.
For the end_page (whose PFN is 0x61fff), set_page_links() cannot be
possibly called, as the zones are simply not aware of of PFNs above
0x61f00 and below 0x70000.
Can you ensure that the ranges passed into free_area_init_node()
are MAX_ORDER aligned as this would initialise the struct pages. You
may have already seen that care is taken when freeing memmap that it
is aligned to MAX_ORDER in free_unused_memmap() in ARM.
The (end >= zone->zone_start_pfn + zone->spanned_pages) in
move_freepages_block() does not stop this crash from happening as both
our memory banks are in the same zone and the empty space within them
is accomodated into this zone via the CONFIG_SPARSEMEM
config option.

When we enable CONFIG_HOLES_IN_ZONE we survive this BUG_ON as well as
any other BUG_ONs in the loop in move_freepages() as then the
pfn_valid_within()/pfn_valid() function takes care of this
functionality, especially in the case where the newly introduced
CONFIG_HAVE_ARCH_PFN_VALID is
enabled.
This is an expensive option in terms of performance. If Russell
wants to pick it up, I won't object but I would strongly suggest that
you solve this problem by ensuring that memmap is initialised on a
MAX_ORDER-aligned boundaries as it'll perform better.

Thanks.

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help