Re: [PATCH v2] mm, hwpoison: Try to recover from copy-on write faults
From: Shuai Xue <xueshuai@linux.alibaba.com>
Date: 2022-10-21 09:30:17
Also in:
linux-mm, lkml
From: Shuai Xue <xueshuai@linux.alibaba.com>
Date: 2022-10-21 09:30:17
Also in:
linux-mm, lkml
在 2022/10/21 PM12:41, Luck, Tony 写道:
quoted
quoted
When we do return to user mode the task is going to be busy servicing a SIGBUS ... so shouldn't try to touch the poison page before the memory_failure() called by the worker thread cleans things up.What about an RT process on a busy system? The worker threads are pretty low priority.Most tasks don't have a SIGBUS handler ... so they just die without possibility of accessing poison If this task DOES have a SIGBUS handler, and that for some bizarre reason just does a "return" so the task jumps back to the instruction that cause the COW then there is a 63/64 likelihood that it is touching a different cache line from the poisoned one. In the 1/64 case ... its probably a simple store (since there was a COW, we know it was trying to modify the page) ... so won't generate another machine check (those only happen for reads). But maybe it is some RMW instruction ... then, if all the above options didn't happen ... we could get another machine check from the same address. But then we just follow the usual recovery path. -Tony
Let assume the instruction that cause the COW is in the 63/64 case, aka, it is writing a different cache line from the poisoned one. But the new_page allocated in COW is dropped right? So might page fault again? Best Regards, Shuai