Re: [PATCH 06/10] mm: page_alloc: Distinguish between being unable to sleep,... | linux-mm

[PATCH 00/10] Remove zonelist cache and high-order watermark checking v2 · Mel Gorman <hidden> · 2015-08-12
[PATCH 02/10] mm, page_alloc: Remove unnecessary parameter from zone_watermark_ok_safe · Mel Gorman <hidden> · 2015-08-12
Re: [PATCH 02/10] mm, page_alloc: Remove unnecessary parameter from zone_watermark_ok_safe · Michal Hocko <mhocko@kernel.org> · 2015-08-20
[PATCH 03/10] mm, page_alloc: Remove unnecessary recalculations for dirty zone balancing · Mel Gorman <hidden> · 2015-08-12
Re: [PATCH 03/10] mm, page_alloc: Remove unnecessary recalculations for dirty zone balancing · Michal Hocko <mhocko@kernel.org> · 2015-08-20
Re: [PATCH 03/10] mm, page_alloc: Remove unnecessary recalculations for dirty zone balancing · Mel Gorman <hidden> · 2015-08-20
Re: [PATCH 03/10] mm, page_alloc: Remove unnecessary recalculations for dirty zone balancing · Michal Hocko <mhocko@kernel.org> · 2015-08-20
[PATCH 10/10] mm, page_alloc: Only enforce watermarks for order-0 allocations · Mel Gorman <hidden> · 2015-08-12
[PATCH 09/10] mm, page_alloc: Reserve pageblocks for high-order atomic allocations on demand · Mel Gorman <hidden> · 2015-08-12
[PATCH 08/10] mm, page_alloc: Remove MIGRATE_RESERVE · Mel Gorman <hidden> · 2015-08-12
[PATCH 06/10] mm: page_alloc: Distinguish between being unable to sleep, unwilling to unwilling and avoiding waking kswapd · Mel Gorman <hidden> · 2015-08-12
Re: [PATCH 06/10] mm: page_alloc: Distinguish between being unable to sleep, unwilling to unwilling and avoiding waking kswapd · Michal Hocko <mhocko@kernel.org> · 2015-08-12
Re: [PATCH 06/10] mm: page_alloc: Distinguish between being unable to sleep, unwilling to unwilling and avoiding waking kswapd · Vlastimil Babka <hidden> · 2015-08-19
Re: [PATCH 06/10] mm: page_alloc: Distinguish between being unable to sleep, unwilling to unwilling and avoiding waking kswapd · Mel Gorman <hidden> · 2015-08-20
Re: [PATCH 06/10] mm: page_alloc: Distinguish between being unable to sleep, unwilling to unwilling and avoiding waking kswapd · Vlastimil Babka <hidden> · 2015-08-21
Re: [PATCH 06/10] mm: page_alloc: Distinguish between being unable to sleep, unwilling to unwilling and avoiding waking kswapd · Mel Gorman <hidden> · 2015-08-21
[PATCH 07/10] mm: page_alloc: Rename __GFP_WAIT to __GFP_RECLAIM · Mel Gorman <hidden> · 2015-08-12
Re: [PATCH 07/10] mm: page_alloc: Rename __GFP_WAIT to __GFP_RECLAIM · Michal Hocko <mhocko@kernel.org> · 2015-08-20
Re: [PATCH 07/10] mm: page_alloc: Rename __GFP_WAIT to __GFP_RECLAIM · Michal Hocko <mhocko@kernel.org> · 2015-08-20
Re: [PATCH 07/10] mm: page_alloc: Rename __GFP_WAIT to __GFP_RECLAIM · Vlastimil Babka <hidden> · 2015-08-21
Re: [PATCH 07/10] mm: page_alloc: Rename __GFP_WAIT to __GFP_RECLAIM · Mel Gorman <hidden> · 2015-08-21
[PATCH 05/10] mm, page_alloc: Use masks and shifts when converting GFP flags to migrate types · Mel Gorman <hidden> · 2015-08-12
Re: [PATCH 05/10] mm, page_alloc: Use masks and shifts when converting GFP flags to migrate types · Christoph Lameter <hidden> · 2015-08-12
[PATCH 04/10] mm, page_alloc: Remove unnecessary taking of a seqlock when cpusets are disabled · Mel Gorman <hidden> · 2015-08-12
Re: [PATCH 04/10] mm, page_alloc: Remove unnecessary taking of a seqlock when cpusets are disabled · David Rientjes <rientjes@google.com> · 2015-08-13
Re: [PATCH 04/10] mm, page_alloc: Remove unnecessary taking of a seqlock when cpusets are disabled · Mel Gorman <hidden> · 2015-08-17
[PATCH 01/10] mm, page_alloc: Delete the zonelist_cache · Mel Gorman <hidden> · 2015-08-12
Re: [PATCH 01/10] mm, page_alloc: Delete the zonelist_cache · Michal Hocko <mhocko@kernel.org> · 2015-08-20
Re: [PATCH 01/10] mm, page_alloc: Delete the zonelist_cache · Mel Gorman <hidden> · 2015-08-20
Re: [PATCH 01/10] mm, page_alloc: Delete the zonelist_cache · Michal Hocko <mhocko@kernel.org> · 2015-08-21
Re: [PATCH 01/10] mm, page_alloc: Delete the zonelist_cache · Vlastimil Babka <hidden> · 2015-08-20
Re: [PATCH 01/10] mm, page_alloc: Delete the zonelist_cache · Mel Gorman <hidden> · 2015-08-20
Re: [PATCH 01/10] mm, page_alloc: Delete the zonelist_cache · Vlastimil Babka <hidden> · 2015-08-20

Re: [PATCH 06/10] mm: page_alloc: Distinguish between being unable to sleep, unwilling to unwilling and avoiding waking kswapd

From: Vlastimil Babka <hidden>
Date: 2015-08-19 14:44:45
Also in: lkml
Subsystem: memory management, memory management - page allocator, the rest · Maintainers: Andrew Morton, Vlastimil Babka, Linus Torvalds

On 08/12/2015 12:45 PM, Mel Gorman wrote:

__GFP_WAIT has been used to identify atomic context in callers that hold
spinlocks or are in interrupts. They are expected to be high priority and
have access one of two watermarks lower than "min". __GFP_HIGH users get
access to the first lower watermark and can be called the "high priority
reserve". Atomic users and interrupts access yet another lower watermark
that can be called the "atomic reserve".

Over time, callers had a requirement to not block when fallback options
were available. Some have abused __GFP_WAIT leading to a situation where
an optimisitic allocation with a fallback option can access atomic reserves.

This patch uses __GFP_ATOMIC to identify callers that are truely atomic,
cannot sleep and have no alternative. High priority users continue to use
__GFP_HIGH. __GFP_DIRECT_RECLAIM identifies callers that can sleep and are
willing to enter direct reclaim. __GFP_KSWAPD_RECLAIM to identify callers
that want to wake kswapd for background reclaim. __GFP_WAIT is redefined
as a caller that is willing to enter direct reclaim and wake kswapd for
background reclaim.

This patch then converts a number of sites

o __GFP_ATOMIC is used by callers that are high priority and have memory
  pools for those requests. GFP_ATOMIC uses this flag. Callers with
  interrupts disabled still automatically use the atomic reserves.

o Callers that have a limited mempool to guarantee forward progress use
  __GFP_DIRECT_RECLAIM. bio allocations fall into this category where
  kswapd will still be woken but atomic reserves are not used as there
  is a one-entry mempool to guarantee progress.

o Callers that are checking if they are non-blocking should use the
  helper gfpflags_allows_blocking() where possible. This is because
  checking for __GFP_WAIT as was done historically now can trigger false
  positives. Some exceptions like dm-crypt.c exist where the code intent
  is clearer if __GFP_DIRECT_RECLAIM is used instead of the helper due to
  flag manipulations.

The key hazard to watch out for is callers that removed __GFP_WAIT and
was depending on access to atomic reserves for inconspicuous reasons.
In some cases it may be appropriate for them to use __GFP_HIGH.

Signed-off-by: Mel Gorman <redacted>

I like the approach, it makes things much clearer. Note that this isn't full
review, just something that crossed my mind.

quoted hunk ↗ jump to hunk

@@ -117,9 +132,9 @@ struct vm_area_struct;
 #define GFP_HIGHUSER	(GFP_USER | __GFP_HIGHMEM)
 #define GFP_HIGHUSER_MOVABLE	(GFP_HIGHUSER | __GFP_MOVABLE)
 #define GFP_IOFS	(__GFP_IO | __GFP_FS)
-#define GFP_TRANSHUGE	(GFP_HIGHUSER_MOVABLE | __GFP_COMP | \
-			 __GFP_NOMEMALLOC | __GFP_NORETRY | __GFP_NOWARN | \
-			 __GFP_NO_KSWAPD)
+#define GFP_TRANSHUGE	((GFP_HIGHUSER_MOVABLE | __GFP_COMP | \
+			 __GFP_NOMEMALLOC | __GFP_NORETRY | __GFP_NOWARN) & \
+			 ~__GFP_KSWAPD_RECLAIM)

Unfortunately this is not as simple for all uses of GFP_TRANSHUGE.
Namely in __alloc_pages_slowpath() the checks could use __GFP_NO_KSWAPD as one
of the distinguishing flags, but to test for lack of __GFP_KSWAPD_RECLAIM, they
should be adjusted in order to be functionally equivalent.
Yes, it would be better if we could get rid of them, but that's out of scope
here. So, something like this?

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index beda417..e019a89 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c

@@ -3073,7 +3073,8 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
 		goto got_pg;
 
 	/* Checks for THP-specific high-order allocations */
-	if ((gfp_mask & GFP_TRANSHUGE) == GFP_TRANSHUGE) {
+	if ((gfp_mask & (GFP_TRANSHUGE | __GFP_KSWAPD_RECLAIM))
+							== GFP_TRANSHUGE) {
 		/*
 		 * If compaction is deferred for high-order allocations, it is
 		 * because sync compaction recently failed. If this is the case

@@ -3108,8 +3109,8 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
 	 * fault, so use asynchronous memory compaction for THP unless it is
 	 * khugepaged trying to collapse.
 	 */
-	if ((gfp_mask & GFP_TRANSHUGE) != GFP_TRANSHUGE ||
-						(current->flags & PF_KTHREAD))
+	if ((gfp_mask & (GFP_TRANSHUGE | __GFP_KSWAPD_RECLAIM)
+			!= GFP_TRANSHUGE) || (current->flags & PF_KTHREAD))
 		migration_mode = MIGRATE_SYNC_LIGHT;
 
 	/* Try direct reclaim and then allocating */



--

To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help