Thread (92 messages) 92 messages, 6 authors, 2025-11-24

Re: [PATCH v6 02/20] liveupdate: luo_core: integrate with KHO

From: Jason Gunthorpe <jgg@nvidia.com>
Date: 2025-11-18 14:03:05
Also in: linux-doc, linux-fsdevel, linux-mm, lkml

On Tue, Nov 18, 2025 at 01:21:34PM +0200, Mike Rapoport wrote:
On Mon, Nov 17, 2025 at 11:22:54PM -0500, Pasha Tatashin wrote:
quoted
quoted
You can avoid that complexity if you register the device with a different
fops, but that's technicality.

Your point about treating the incoming FDT as an underlying resource that
failed to initialize makes sense, but nevertheless userspace needs a
reliable way to detect it and parsing dmesg is not something we should rely
on.
I see two solutions:

1. LUO fails to retrieve the preserved data, the user gets informed by
not finding /dev/liveupdate, and studying the dmesg for what has
happened (in reality in fleets version mismatches should not be
happening, those should be detected in quals).
2. Create a zombie device to return some errno on open, and still
study dmesg to understand what really happened.
User should not study dmesg. We need another solution.
What's wrong with e.g. ioctl()?
It seems very dangerous to even boot at all if the next kernel doesn't
understand the serialization information..

IMHO I think we should not even be thinking about this, it is up to
the predecessor environment to prevent it from happening. The ideas to
use ELF metadata/etc to allow a pre-flight validation are the right
solution.

If we get into the next kernel and it receives information it cannot
process it should just BUG_ON and die, or some broad equivalent. 
It is a catastrophic orchestration error, and we don't need some fine
grain recovery or userspace visibility. Crash dump the system and
reboot it.

IOW, I would not invest time in this.

Jason
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help