Thread (4 messages) 4 messages, 2 authors, 2021-07-07

Re: [PATCH v2] x86/mce: Fix endless loop when run task works after #MC

From: "Luck, Tony" <tony.luck@intel.com>
Date: 2021-07-06 16:45:02
Also in: lkml, stable

On Tue, Jul 06, 2021 at 08:16:06PM +0800, Ding Hui wrote:
Recently we encounter multi #MC on the same task when it's
task_work_run() has not been called, current->mce_kill_me was
added to task_works list more than once, that make a circular
linked task_works, so task_work_run() will do a endless loop.
I saw the same and posted a similar fix a while back:

https://www.spinics.net/lists/linux-mm/msg251006.html

It didn't get merged because some validation tests began failing
around the same time.  I'm now pretty sure I understand what happened
with those other tests.

I'll post my updated version (second patch in a three part series)
later today.
quoted hunk ↗ jump to hunk
diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
+	if (!cmpxchg(&current->mce_kill_me.func, NULL, ch.func)) {
+		current->mce_addr = m->addr;
+		current->mce_kflags = m->kflags;
+		current->mce_ripv = !!(m->mcgstatus & MCG_STATUS_RIPV);
+		current->mce_whole_page = whole_page(m);
You don't need an atomic cmpxchg here (nor the WRITE_ONCE() to clear it).
The task is operating on its own task_struct. Nobody else should touch
the mce_kill_me field.

-Tony
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help