Thread (59 messages) 59 messages, 9 authors, 2016-03-23

Suspicious error for CMA stress test

From: Joonsoo Kim <hidden>
Date: 2016-03-23 04:42:42
Also in: linux-mm, lkml

On Fri, Mar 18, 2016 at 03:10:09PM +0100, Vlastimil Babka wrote:
On 03/17/2016 04:52 PM, Joonsoo Kim wrote:
quoted
2016-03-18 0:43 GMT+09:00 Vlastimil Babka [off-list ref]:
quoted
quoted
quoted
quoted
quoted
Okay. I used following slightly optimized version and I need to
add 'max_order = min_t(unsigned int, MAX_ORDER, pageblock_order + 1)'
to yours. Please consider it, too.
Hmm, this one is not work, I still can see the bug is there after
applying
this patch, did I miss something?
I may find that there is a bug which was introduced by me some time
ago. Could you test following change in __free_one_page() on top of
Vlastimil's patch?

-page_idx = pfn & ((1 << max_order) - 1);
+page_idx = pfn & ((1 << MAX_ORDER) - 1);

I tested Vlastimil's patch + your change with stress for more than half
hour, the bug
I reported is gone :)

Oh, ok, will try to send proper patch, once I figure out what to write in
the changelog :)
Thanks in advance!
OK, here it is. Hanjun can you please retest this, as I'm not sure if you had
the same code due to the followup one-liner patches in the thread. Lucas, see if
it helps with your issue as well. Laura and Joonsoo, please also test and review
and check changelog if my perception of the problem is accurate :)

Thanks

----8<----
From: Vlastimil Babka <redacted>
Date: Fri, 18 Mar 2016 14:22:31 +0100
Subject: [PATCH] mm/page_alloc: prevent merging between isolated and other
 pageblocks

Hanjun Guo has reported that a CMA stress test causes broken accounting of
CMA and free pages:
quoted
Before the test, I got:
-bash-4.3# cat /proc/meminfo | grep Cma
CmaTotal:         204800 kB
CmaFree:          195044 kB


After running the test:
-bash-4.3# cat /proc/meminfo | grep Cma
CmaTotal:         204800 kB
CmaFree:         6602584 kB

So the freed CMA memory is more than total..

Also the the MemFree is more than mem total:

-bash-4.3# cat /proc/meminfo
MemTotal:       16342016 kB
MemFree:        22367268 kB
MemAvailable:   22370528 kB
Laura Abbott has confirmed the issue and suspected the freepage accounting
rewrite around 3.18/4.0 by Joonsoo Kim. Joonsoo had a theory that this is
caused by unexpected merging between MIGRATE_ISOLATE and MIGRATE_CMA
pageblocks:
quoted
CMA isolates MAX_ORDER aligned blocks, but, during the process,
partialy isolated block exists. If MAX_ORDER is 11 and
pageblock_order is 9, two pageblocks make up MAX_ORDER
aligned block and I can think following scenario because pageblock
(un)isolation would be done one by one.

(each character means one pageblock. 'C', 'I' means MIGRATE_CMA,
MIGRATE_ISOLATE, respectively.

CC -> IC -> II (Isolation)
II -> CI -> CC (Un-isolation)

If some pages are freed at this intermediate state such as IC or CI,
that page could be merged to the other page that is resident on
different type of pageblock and it will cause wrong freepage count.
This was supposed to be prevented by CMA operating on MAX_ORDER blocks, but
since it doesn't hold the zone->lock between pageblocks, a race window does
exist.

It's also likely that unexpected merging can occur between MIGRATE_ISOLATE
and non-CMA pageblocks. This should be prevented in __free_one_page() since
commit 3c605096d315 ("mm/page_alloc: restrict max order of merging on isolated
pageblock"). However, we only check the migratetype of the pageblock where
buddy merging has been initiated, not the migratetype of the buddy pageblock
(or group of pageblocks) which can be MIGRATE_ISOLATE.

Joonsoo has suggested checking for buddy migratetype as part of
page_is_buddy(), but that would add extra checks in allocator hotpath and
bloat-o-meter has shown significant code bloat (the function is inline).

This patch reduces the bloat at some expense of more complicated code. The
buddy-merging while-loop in __free_one_page() is initially bounded to
pageblock_border and without any migratetype checks. The checks are placed
outside, bumping the max_order if merging is allowed, and returning to the
while-loop with a statement which can't be possibly considered harmful.

This fixes the accounting bug and also removes the arguably weird state in the
original commit 3c605096d315 where buddies could be left unmerged.

Fixes: 3c605096d315 ("mm/page_alloc: restrict max order of merging on isolated pageblock")
Link: https://lkml.org/lkml/2016/3/2/280
Reported-by: Hanjun Guo <guohanjun@huawei.com>
Debugged-by: Laura Abbott [off-list ref]
Debugged-by: Joonsoo Kim [off-list ref]
Signed-off-by: Vlastimil Babka <redacted>
Cc: <redacted> # 3.18+
---
 mm/page_alloc.c | 46 +++++++++++++++++++++++++++++++++-------------
 1 file changed, 33 insertions(+), 13 deletions(-)
Acked-by: Joonsoo Kim <redacted>

Thanks for taking care of this issue!.

Thanks.
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help