Re: [PATCH] mm: memcontrol: fix blocking rstat function called from atomic cgroup1 thresholding code
From: Shakeel Butt <hidden>
Date: 2021-07-27 16:51:17
Also in:
cgroups, lkml
From: Shakeel Butt <hidden>
Date: 2021-07-27 16:51:17
Also in:
cgroups, lkml
On Mon, Jul 26, 2021 at 8:19 AM Rik van Riel [off-list ref] wrote:
On Mon, 2021-07-26 at 11:00 -0400, Johannes Weiner wrote:quoted
__mem_cgroup_threshold() indeed holds the rcu lock. In addition, the thresholding code is invoked during stat changes, and those contexts have irqs disabled as well. If the lock breaking occurs inside the flush function, it will result in a sleep from an atomic context. Use the irsafe flushing variant in mem_cgroup_usage() to fix thisWhile this fix is necessary, in the long term I think we may want some sort of redesign here, to make sure the irq safe version does not spin long times trying to get the statistics off some other CPU. I have seen a number of soft (IIRC) lockups deep inside the bowels of cgroup_rstat_flush_irqsafe, with the function taking multiple seconds to complete.
Can you please share a bit more detail on this lockup? I am wondering if this was due to the flush not happening more often and thus the update tree is large or if there are too many concurrent flushes happening.
Reviewed-by: Rik van Riel <riel@surriel.com>