Re: [PATCH v6 02/20] liveupdate: luo_core: integrate with KHO
From: Pasha Tatashin <pasha.tatashin@soleen.com>
Date: 2025-11-18 15:19:09
Also in:
linux-doc, linux-fsdevel, linux-mm, lkml
On Tue, Nov 18, 2025 at 10:06 AM Mike Rapoport [off-list ref] wrote:
On Tue, Nov 18, 2025 at 10:03:00AM -0400, Jason Gunthorpe wrote:quoted
On Tue, Nov 18, 2025 at 01:21:34PM +0200, Mike Rapoport wrote:quoted
On Mon, Nov 17, 2025 at 11:22:54PM -0500, Pasha Tatashin wrote:quoted
quoted
You can avoid that complexity if you register the device with a different fops, but that's technicality. Your point about treating the incoming FDT as an underlying resource that failed to initialize makes sense, but nevertheless userspace needs a reliable way to detect it and parsing dmesg is not something we should rely on.I see two solutions: 1. LUO fails to retrieve the preserved data, the user gets informed by not finding /dev/liveupdate, and studying the dmesg for what has happened (in reality in fleets version mismatches should not be happening, those should be detected in quals). 2. Create a zombie device to return some errno on open, and still study dmesg to understand what really happened.User should not study dmesg. We need another solution. What's wrong with e.g. ioctl()?It seems very dangerous to even boot at all if the next kernel doesn't understand the serialization information.. IMHO I think we should not even be thinking about this, it is up to the predecessor environment to prevent it from happening. The ideas to use ELF metadata/etc to allow a pre-flight validation are the right solution.
100% agreed, this is the goal.
quoted
If we get into the next kernel and it receives information it cannot process it should just BUG_ON and die, or some broad equivalent.
I initially had a panic() that would kill the kernel, but after further consideration, I realized that we can still boot into "maintenance" mode and allow the user to decide when and how to reboot the machine back to a normal state. Crashing during early boot has its own disadvantages: the crash kernel is not available. Also, because live-update has to be very fast, the console is likely to be disabled. Therefore, getting to userspace and allowing the user to investigate what happened (e.g., automatically retrieving dmesg or a core dump and filing a bug) before rebooting seems like the most sensible approach. This won't leak data, as /dev/liveupdate is completely disabled, so nothing preserved in memory will be recoverable. Pasha