Thread (92 messages) 92 messages, 6 authors, 2025-11-24

Re: [PATCH v6 02/20] liveupdate: luo_core: integrate with KHO

From: Pasha Tatashin <pasha.tatashin@soleen.com>
Date: 2025-11-18 15:19:09
Also in: linux-doc, linux-fsdevel, linux-mm, lkml

On Tue, Nov 18, 2025 at 10:06 AM Mike Rapoport [off-list ref] wrote:
On Tue, Nov 18, 2025 at 10:03:00AM -0400, Jason Gunthorpe wrote:
quoted
On Tue, Nov 18, 2025 at 01:21:34PM +0200, Mike Rapoport wrote:
quoted
On Mon, Nov 17, 2025 at 11:22:54PM -0500, Pasha Tatashin wrote:
quoted
quoted
You can avoid that complexity if you register the device with a different
fops, but that's technicality.

Your point about treating the incoming FDT as an underlying resource that
failed to initialize makes sense, but nevertheless userspace needs a
reliable way to detect it and parsing dmesg is not something we should rely
on.
I see two solutions:

1. LUO fails to retrieve the preserved data, the user gets informed by
not finding /dev/liveupdate, and studying the dmesg for what has
happened (in reality in fleets version mismatches should not be
happening, those should be detected in quals).
2. Create a zombie device to return some errno on open, and still
study dmesg to understand what really happened.
User should not study dmesg. We need another solution.
What's wrong with e.g. ioctl()?
It seems very dangerous to even boot at all if the next kernel doesn't
understand the serialization information..

IMHO I think we should not even be thinking about this, it is up to
the predecessor environment to prevent it from happening. The ideas to
use ELF metadata/etc to allow a pre-flight validation are the right
solution.
100% agreed, this is the goal.
quoted
If we get into the next kernel and it receives information it cannot
process it should just BUG_ON and die, or some broad equivalent.
I initially had a panic() that would kill the kernel, but after
further consideration, I realized that we can still boot into
"maintenance" mode and allow the user to decide when and how to reboot
the machine back to a normal state.

Crashing during early boot has its own disadvantages: the crash kernel
is not available. Also, because live-update has to be very fast, the
console is likely to be disabled. Therefore, getting to userspace and
allowing the user to investigate what happened (e.g., automatically
retrieving dmesg or a core dump and filing a bug) before rebooting
seems like the most sensible approach.

This won't leak data, as /dev/liveupdate is completely disabled, so
nothing preserved in memory will be recoverable.

Pasha
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help