Thread (59 messages) 59 messages, 9 authors, 2016-03-23

Suspicious error for CMA stress test

From: Joonsoo Kim <hidden>
Date: 2016-03-21 04:41:01
Also in: linux-mm, lkml

On Fri, Mar 18, 2016 at 02:32:35PM +0100, Lucas Stach wrote:
Hi Vlastimil, Joonsoo,

Am Freitag, den 18.03.2016, 00:52 +0900 schrieb Joonsoo Kim:
quoted
2016-03-18 0:43 GMT+09:00 Vlastimil Babka [off-list ref]:
quoted
On 03/17/2016 10:24 AM, Hanjun Guo wrote:
quoted
On 2016/3/17 14:54, Joonsoo Kim wrote:
quoted
On Wed, Mar 16, 2016 at 05:44:28PM +0800, Hanjun Guo wrote:
quoted
On 2016/3/14 15:18, Joonsoo Kim wrote:
quoted
On Mon, Mar 14, 2016 at 08:06:16AM +0100, Vlastimil Babka wrote:
quoted
On 03/14/2016 07:49 AM, Joonsoo Kim wrote:
quoted
On Fri, Mar 11, 2016 at 06:07:40PM +0100, Vlastimil Babka wrote:
quoted
On 03/11/2016 04:00 PM, Joonsoo Kim wrote:

How about something like this? Just and idea, probably buggy
(off-by-one etc.).
Should keep away cost from <pageblock_order iterations@the
expense of the
relatively fewer >pageblock_order iterations.
Hmm... I tested this and found that it's code size is a little bit
larger than mine. I'm not sure why this happens exactly but I guess
it would be
related to compiler optimization. In this case, I'm in favor of my
implementation because it looks like well abstraction. It adds one
unlikely branch to the merge loop but compiler would optimize it to
check it once.
I would be surprised if compiler optimized that to check it once, as
order increases with each loop iteration. But maybe it's smart
enough to do something like I did by hand? Guess I'll check the
disassembly.
Okay. I used following slightly optimized version and I need to
add 'max_order = min_t(unsigned int, MAX_ORDER, pageblock_order + 1)'
to yours. Please consider it, too.
Hmm, this one is not work, I still can see the bug is there after
applying
this patch, did I miss something?
I may find that there is a bug which was introduced by me some time
ago. Could you test following change in __free_one_page() on top of
Vlastimil's patch?

-page_idx = pfn & ((1 << max_order) - 1);
+page_idx = pfn & ((1 << MAX_ORDER) - 1);

I tested Vlastimil's patch + your change with stress for more than half
hour, the bug
I reported is gone :)

Oh, ok, will try to send proper patch, once I figure out what to write in
the changelog :)
Thanks in advance!
After digging into the "PFN busy" race in CMA (see [1]), I believe we
should just prevent any buddy merging in isolated ranges. This fixes the
race I'm seeing without the need to hold the zone lock for extend
periods of time.
"PFNs busy" can be caused by other type of race, too. I guess that
other cases happens more than buddy merging. Do you have any test case for
your problem?

If it is indeed a problem, you can avoid it with simple retry
MAX_ORDER times on alloc_contig_range(). This is a rather dirty but
the reason I suggest it is that there are other type of race in
__alloc_contig_range() and retry could help them, too. For example,
if some of pages in the requested range isn't attached to the LRU yet
or detached from the LRU but not freed to buddy,
test_pages_isolated() can be failed.

Thanks.
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help