Thread (18 messages) 18 messages, 3 authors, 2017-12-14

Re: [PATCH 1/2] mm: NUMA stats code cleanup and enhancement

From: kemi <hidden>
Date: 2017-11-30 05:58:15
Also in: lkml


On 2017a1'11ae??29ae?JPY 20:17, Michal Hocko wrote:
On Tue 28-11-17 14:00:23, Kemi Wang wrote:
quoted
The existed implementation of NUMA counters is per logical CPU along with
zone->vm_numa_stat[] separated by zone, plus a global numa counter array
vm_numa_stat[]. However, unlike the other vmstat counters, numa stats don't
effect system's decision and are only read from /proc and /sys, it is a
slow path operation and likely tolerate higher overhead. Additionally,
usually nodes only have a single zone, except for node 0. And there isn't
really any use where you need these hits counts separated by zone.

Therefore, we can migrate the implementation of numa stats from per-zone to
per-node, and get rid of these global numa counters. It's good enough to
keep everything in a per cpu ptr of type u64, and sum them up when need, as
suggested by Andi Kleen. That's helpful for code cleanup and enhancement
(e.g. save more than 130+ lines code).
I agree. Having these stats per zone is a bit of overcomplication. The
only consumer is /proc/zoneinfo and I would argue this doesn't justify
the additional complexity. Who does really need to know per zone broken
out numbers?

Anyway, I haven't checked your implementation too deeply but why don't
you simply define static percpu array for each numa node?
To be honest, there are another two ways I can think of listed below. but I don't
think they are simpler than my current implementation. Maybe you have better idea.

static u64 __percpu vm_stat_numa[num_possible_nodes() * NR_VM_NUMA_STAT_ITEMS];
But it's not correct.

Or we can add an u64 percpu array with size of NR_VM_NUMA_STAT_ITEMS in struct pglist_data.

My current implementation is quite straightforward by combining all of local counters
together, only one percpu array with size of num_possible_nodes()*NR_VM_NUMA_STAT_ITEMS 
is enough for that.
		
[...]
quoted
+extern u64 __percpu *vm_numa_stat;
[...]
quoted
+#ifdef CONFIG_NUMA
+	size = sizeof(u64) * num_possible_nodes() * NR_VM_NUMA_STAT_ITEMS;
+	align = __alignof__(u64[num_possible_nodes() * NR_VM_NUMA_STAT_ITEMS]);
+	vm_numa_stat = (u64 __percpu *)__alloc_percpu(size, align);
+#endif
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help