Thread (15 messages) 15 messages, 5 authors, 2020-03-19

Re: [PATCH v2 1/4] mm: Check for node_online in node_present_pages

From: Vlastimil Babka <hidden>
Date: 2020-03-18 11:53:37
Also in: linux-mm
Subsystem: memory management, slab allocator, the rest · Maintainers: Andrew Morton, Vlastimil Babka, Harry Yoo, Linus Torvalds

On 3/18/20 11:02 AM, Michal Hocko wrote:
On Wed 18-03-20 12:58:07, Srikar Dronamraju wrote:
quoted
Calling a kmalloc_node on a possible node which is not yet onlined can
lead to panic. Currently node_present_pages() doesn't verify the node is
online before accessing the pgdat for the node. However pgdat struct may
not be available resulting in a crash.

NIP [c0000000003d55f4] ___slab_alloc+0x1f4/0x760
LR [c0000000003d5b94] __slab_alloc+0x34/0x60
Call Trace:
[c0000008b3783960] [c0000000003d5734] ___slab_alloc+0x334/0x760 (unreliable)
[c0000008b3783a40] [c0000000003d5b94] __slab_alloc+0x34/0x60
[c0000008b3783a70] [c0000000003d6fa0] __kmalloc_node+0x110/0x490
[c0000008b3783af0] [c0000000003443d8] kvmalloc_node+0x58/0x110
[c0000008b3783b30] [c0000000003fee38] mem_cgroup_css_online+0x108/0x270
[c0000008b3783b90] [c000000000235aa8] online_css+0x48/0xd0
[c0000008b3783bc0] [c00000000023eaec] cgroup_apply_control_enable+0x2ec/0x4d0
[c0000008b3783ca0] [c000000000242318] cgroup_mkdir+0x228/0x5f0
[c0000008b3783d10] [c00000000051e170] kernfs_iop_mkdir+0x90/0xf0
[c0000008b3783d50] [c00000000043dc00] vfs_mkdir+0x110/0x230
[c0000008b3783da0] [c000000000441c90] do_mkdirat+0xb0/0x1a0
[c0000008b3783e20] [c00000000000b278] system_call+0x5c/0x68

Fix this by verifying the node is online before accessing the pgdat
structure. Fix the same for node_spanned_pages() too.

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: linux-mm@kvack.org
Cc: Mel Gorman <mgorman@suse.de>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Sachin Sant <redacted>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Christopher Lameter <redacted>
Cc: linuxppc-dev@lists.ozlabs.org
Cc: Joonsoo Kim <redacted>
Cc: Kirill Tkhai <redacted>
Cc: Vlastimil Babka <redacted>
Cc: Srikar Dronamraju <redacted>
Cc: Bharata B Rao <redacted>
Cc: Nathan Lynch <redacted>

Reported-by: Sachin Sant <redacted>
Tested-by: Sachin Sant <redacted>
Signed-off-by: Srikar Dronamraju <redacted>
---
 include/linux/mmzone.h | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index f3f264826423..88078a3b95e5 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -756,8 +756,10 @@ typedef struct pglist_data {
 	atomic_long_t		vm_stat[NR_VM_NODE_STAT_ITEMS];
 } pg_data_t;
 
-#define node_present_pages(nid)	(NODE_DATA(nid)->node_present_pages)
-#define node_spanned_pages(nid)	(NODE_DATA(nid)->node_spanned_pages)
+#define node_present_pages(nid)		\
+	(node_online(nid) ? NODE_DATA(nid)->node_present_pages : 0)
+#define node_spanned_pages(nid)		\
+	(node_online(nid) ? NODE_DATA(nid)->node_spanned_pages : 0)
I believe this is a wrong approach. We really do not want to special
case all the places which require NODE_DATA. Can we please go and
allocate pgdat for all possible nodes?

The current state of memory less hacks subtle bugs poping up here and
there just prove that we should have done that from the very begining
IMHO.
Yes. So here's an alternative proposal for fixing the current situation in SLUB,
before the long-term solution of having all possible nodes provide valid pgdat
with zonelists:

- fix SLUB with the hunk at the end of this mail - the point is to use NUMA_NO_NODE
  as fallback instead of node_to_mem_node()
- this removes all uses of node_to_mem_node (luckily it's just SLUB),
  kill it completely instead of trying to fix it up
- patch 1/4 is not needed with the fix
- perhaps many of your other patches are alss not needed 
- once we get the long-term solution, some of the !node_online() checks can be removed

----8<----
diff --git a/mm/slub.c b/mm/slub.c
index 17dc00e33115..1d4f2d7a0080 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -1511,7 +1511,7 @@ static inline struct page *alloc_slab_page(struct kmem_cache *s,
 	struct page *page;
 	unsigned int order = oo_order(oo);
 
-	if (node == NUMA_NO_NODE)
+	if (node == NUMA_NO_NODE || !node_online(node))
 		page = alloc_pages(flags, order);
 	else
 		page = __alloc_pages_node(node, flags, order);
@@ -1973,8 +1973,6 @@ static void *get_partial(struct kmem_cache *s, gfp_t flags, int node,
 
 	if (node == NUMA_NO_NODE)
 		searchnode = numa_mem_id();
-	else if (!node_present_pages(node))
-		searchnode = node_to_mem_node(node);
 
 	object = get_partial_node(s, get_node(s, searchnode), c, flags);
 	if (object || node != NUMA_NO_NODE)
@@ -2568,12 +2566,15 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
 redo:
 
 	if (unlikely(!node_match(page, node))) {
-		int searchnode = node;
-
-		if (node != NUMA_NO_NODE && !node_present_pages(node))
-			searchnode = node_to_mem_node(node);
-
-		if (unlikely(!node_match(page, searchnode))) {
+		/*
+		 * node_match() false implies node != NUMA_NO_NODE
+		 * but if the node is not online and has no pages, just
+		 * ignore the constraint
+		 */
+		if ((!node_online(node) || !node_present_pages(node))) {
+			node = NUMA_NO_NODE;
+			goto redo;
+		} else {
 			stat(s, ALLOC_NODE_MISMATCH);
 			deactivate_slab(s, page, c->freelist, c);
 			goto new_slab;
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help