Re: [PATCH v2 1/3] x86/mce: Avoid infinite loop for copy from user recovery
From: "Luck, Tony" <tony.luck@intel.com>
Date: 2021-01-11 22:21:57
Also in:
linux-edac, lkml
From: "Luck, Tony" <tony.luck@intel.com>
Date: 2021-01-11 22:21:57
Also in:
linux-edac, lkml
On Mon, Jan 11, 2021 at 02:11:56PM -0800, Andy Lutomirski wrote:
quoted
On Jan 11, 2021, at 1:45 PM, Tony Luck [off-list ref] wrote: Recovery action when get_user() triggers a machine check uses the fixup path to make get_user() return -EFAULT. Also queue_task_work() sets up so that kill_me_maybe() will be called on return to user mode to send a SIGBUS to the current process. But there are places in the kernel where the code assumes that this EFAULT return was simply because of a page fault. The code takes some action to fix that, and then retries the access. This results in a second machine check. While processing this second machine check queue_task_work() is called again. But since this uses the same callback_head structure that was used in the first call, the net result is an entry on the current->task_works list that points to itself.Is this happening in pagefault_disable context or normal sleepable fault context? If the latter, maybe we should reconsider finding a way for the machine check code to do its work inline instead of deferring it.
The first machine check is in pagefault_disable() context.
static int get_futex_value_locked(u32 *dest, u32 __user *from)
{
int ret;
pagefault_disable();
ret = __get_user(*dest, from);
pagefault_enable();
return (ret == -ENXIO) ? ret : ret ? -EFAULT : 0;
}
-Tony