Re: [PATCH] mm: memcontrol: fix blocking rstat function called from atomic cgroup1 thresholding code
From: Rik van Riel <hidden>
Date: 2021-08-03 14:34:34
Also in:
linux-mm, lkml
From: Rik van Riel <hidden>
Date: 2021-08-03 14:34:34
Also in:
linux-mm, lkml
On Tue, 2021-07-27 at 09:51 -0700, Shakeel Butt wrote:
On Mon, Jul 26, 2021 at 8:19 AM Rik van Riel [off-list ref] wrote:quoted
On Mon, 2021-07-26 at 11:00 -0400, Johannes Weiner wrote:quoted
__mem_cgroup_threshold() indeed holds the rcu lock. In addition, the thresholding code is invoked during stat changes, and those contexts have irqs disabled as well. If the lock breaking occurs inside the flush function, it will result in a sleep from an atomic context. Use the irsafe flushing variant in mem_cgroup_usage() to fix thisWhile this fix is necessary, in the long term I think we may want some sort of redesign here, to make sure the irq safe version does not spin long times trying to get the statistics off some other CPU. I have seen a number of soft (IIRC) lockups deep inside the bowels of cgroup_rstat_flush_irqsafe, with the function taking multiple seconds to complete.Can you please share a bit more detail on this lockup? I am wondering if this was due to the flush not happening more often and thus the update tree is large or if there are too many concurrent flushes happening.
I was not logged into any system while it happened, but only found it later in the logs. I suspect your explanation is the reason why it happened, though.