Thread (7 messages) 7 messages, 3 authors, 2017-11-30

Re: [PATCH] mm: Make count list_lru_one::nr_items lockless

From: Kirill Tkhai <hidden>
Date: 2017-11-30 10:37:01
Also in: lkml

On 30.11.2017 03:27, Shakeel Butt wrote:
On Fri, Sep 29, 2017 at 1:15 AM, Kirill Tkhai [off-list ref] wrote:
quoted
On 29.09.2017 00:02, Andrew Morton wrote:
quoted
On Thu, 28 Sep 2017 10:48:55 +0300 Kirill Tkhai [off-list ref] wrote:
quoted
quoted
quoted
This patch aims to make super_cache_count() (and other functions,
which count LRU nr_items) more effective.
It allows list_lru_node::memcg_lrus to be RCU-accessed, and makes
__list_lru_count_one() count nr_items lockless to minimize
overhead introduced by locking operation, and to make parallel
reclaims more scalable.
And...  what were the effects of the patch?  Did you not run the same
performance tests after applying it?
I've just detected the such high usage of shrink slab on production node. It's rather
difficult to make it use another kernel, than it uses, only kpatches are possible.
So, I haven't estimated how it acts on node's performance.
On test node I see, that the patch obviously removes raw_spin_lock from perf profile.
So, it's a little bit untested in this way.
Well that's a problem.  The patch increases list_lru.o text size by a
lot (4800->5696) which will have a cost.  And we don't have proof that
any benefit is worth that cost.  It shouldn't be too hard to cook up a
synthetic test to trigger memcg slab reclaim and then run a
before-n-after benchmark?
Ok, then, please, ignore this for a while, I'll try to do it a little bit later.
I rebased this patch on linus tree (replacing kfree_rcu with call_rcu
as there is no kvfree_rcu) and did some experiments. I think the patch
is worth to be included.

Setup: running a fork-bomb in a memcg of 200MiB on a 8GiB and 4 vcpu
VM and recording the trace with 'perf record -g -a'.

The trace without the patch:

+  34.19%     fb.sh  [kernel.kallsyms]  [k] queued_spin_lock_slowpath
+  30.77%     fb.sh  [kernel.kallsyms]  [k] _raw_spin_lock
+   3.53%     fb.sh  [kernel.kallsyms]  [k] list_lru_count_one
+   2.26%     fb.sh  [kernel.kallsyms]  [k] super_cache_count
+   1.68%     fb.sh  [kernel.kallsyms]  [k] shrink_slab
+   0.59%     fb.sh  [kernel.kallsyms]  [k] down_read_trylock
+   0.48%     fb.sh  [kernel.kallsyms]  [k] _raw_spin_unlock_irqrestore
+   0.38%     fb.sh  [kernel.kallsyms]  [k] shrink_node_memcg
+   0.32%     fb.sh  [kernel.kallsyms]  [k] queue_work_on
+   0.26%     fb.sh  [kernel.kallsyms]  [k] count_shadow_nodes

With the patch:

+   0.16%     swapper  [kernel.kallsyms]    [k] default_idle
+   0.13%     oom_reaper  [kernel.kallsyms]    [k] mutex_spin_on_owner
+   0.05%     perf  [kernel.kallsyms]    [k] copy_user_generic_string
+   0.05%     init.real  [kernel.kallsyms]    [k] wait_consider_task
+   0.05%     kworker/0:0  [kernel.kallsyms]    [k] finish_task_switch
+   0.04%     kworker/2:1  [kernel.kallsyms]    [k] finish_task_switch
+   0.04%     kworker/3:1  [kernel.kallsyms]    [k] finish_task_switch
+   0.04%     kworker/1:0  [kernel.kallsyms]    [k] finish_task_switch
+   0.03%     binary  [kernel.kallsyms]    [k] copy_page


Kirill, can you resend your patch with this info or do you want me
send the rebased patch?
Shakeel, thanks you for the testing! I'll resend the patch as "v2".

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help