Thread (10 messages) 10 messages, 4 authors, 2021-12-16

Re: [PATCH] mm/memcontrol: Disable on PREEMPT_RT

From: Johannes Weiner <hannes@cmpxchg.org>
Date: 2021-12-07 16:56:02
Also in: cgroups

On Tue, Dec 07, 2021 at 04:52:08PM +0100, Sebastian Andrzej Siewior wrote:
From: Thomas Gleixner <redacted>

MEMCG has a few constructs which are not compatible with PREEMPT_RT's
requirements. This includes:
- relying on disabled interrupts from spin_lock_irqsave() locking for
  something not related to lock itself (like the per-CPU counter).
If memory serves me right, this is the VM_BUG_ON() in workingset.c:

	VM_WARN_ON_ONCE(!irqs_disabled());  /* For __inc_lruvec_page_state */

This isn't memcg specific. This is the serialization model of the
generic MM page counters. They can be updated from process and irq
context, and need to avoid preemption (and corruption) during RMW.

!CONFIG_MEMCG:

static inline void mod_lruvec_kmem_state(void *p, enum node_stat_item idx,
					 int val)
{
	struct page *page = virt_to_head_page(p);

	mod_node_page_state(page_pgdat(page), idx, val);
}

which does:

void mod_node_page_state(struct pglist_data *pgdat, enum node_stat_item item,
					long delta)
{
	unsigned long flags;

	local_irq_save(flags);
	__mod_node_page_state(pgdat, item, delta);
	local_irq_restore(flags);
}

If this breaks PREEMPT_RT, it's broken without memcg too.
- explicitly disabling interrupts and acquiring a spinlock_t based lock
  like in memcg_check_events() -> eventfd_signal().
Similar problem to the above: we disable interrupts to protect RMW
sequences that can (on non-preemptrt) be initiated through process
context as well as irq context.

IIUC, the PREEMPT_RT construct for handling exactly that scenario is
the "local lock". Is that correct?

It appears Ingo has already fixed the LRU cache, which for non-rt also
relies on irq disabling:

commit b01b2141999936ac3e4746b7f76c0f204ae4b445
Author: Ingo Molnar [off-list ref]
Date:   Wed May 27 22:11:15 2020 +0200

    mm/swap: Use local_lock for protection

The memcg charge cache should be fixable the same way.

Likewise, if you fix the generic vmstat counters like this, the memcg
implementation can follow suit.
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help