Thread (8 messages) 8 messages, 2 authors, 3d ago

Re: [PATCH v8 0/6] mm/memory-failure: add panic option for unrecoverable pages

From: Andrew Morton <akpm@linux-foundation.org>
Date: 2026-05-27 19:39:36
Also in: linux-doc, linux-kselftest, linux-mm, lkml

On Wed, 27 May 2026 07:06:13 -0700 Breno Leitao [off-list ref] wrote:
A multi-bit ECC error on a kernel-owned page that the memory failure
handler cannot recover is currently swallowed: PG_hwpoison is set, the
event is logged, and the kernel keeps running.  The corrupted memory
remains accessible to the kernel and either drives silent data
corruption or surfaces seconds-to-minutes later as an apparently
unrelated crash.  In a large fleet that delayed, unattributable crash
turns into significant engineering effort to root-cause; in a kdump
configuration, by the time the crash happens the original error
context (faulting PFN, MCE/GHES record, page state) is long gone.

This series adds an opt-in sysctl,
vm.panic_on_unrecoverable_memory_failure, that converts an
unrecoverable kernel-page hwpoison event into an immediate panic with
a clean dmesg/vmcore that still contains the original failure
context.  The default is disabled so existing workloads see no
change.
Thanks.  That does seem useful.

I'll pass at this time, due to -rc5 and not-very-reviewed.

AI review said a few things.  It claims to have found one pre-existing
issue.

	https://sashiko.dev/#/patchset/20260527-ecc_panic-v8-0-9ea0cfa16bb0@debian.org
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help