Re: [PATCH v6 02/20] liveupdate: luo_core: integrate with KHO
From: Jason Gunthorpe <jgg@nvidia.com>
Date: 2025-11-18 14:03:05
Also in:
linux-doc, linux-fsdevel, linux-mm, lkml
On Tue, Nov 18, 2025 at 01:21:34PM +0200, Mike Rapoport wrote:
On Mon, Nov 17, 2025 at 11:22:54PM -0500, Pasha Tatashin wrote:quoted
quoted
You can avoid that complexity if you register the device with a different fops, but that's technicality. Your point about treating the incoming FDT as an underlying resource that failed to initialize makes sense, but nevertheless userspace needs a reliable way to detect it and parsing dmesg is not something we should rely on.I see two solutions: 1. LUO fails to retrieve the preserved data, the user gets informed by not finding /dev/liveupdate, and studying the dmesg for what has happened (in reality in fleets version mismatches should not be happening, those should be detected in quals). 2. Create a zombie device to return some errno on open, and still study dmesg to understand what really happened.User should not study dmesg. We need another solution. What's wrong with e.g. ioctl()?
It seems very dangerous to even boot at all if the next kernel doesn't understand the serialization information.. IMHO I think we should not even be thinking about this, it is up to the predecessor environment to prevent it from happening. The ideas to use ELF metadata/etc to allow a pre-flight validation are the right solution. If we get into the next kernel and it receives information it cannot process it should just BUG_ON and die, or some broad equivalent. It is a catastrophic orchestration error, and we don't need some fine grain recovery or userspace visibility. Crash dump the system and reboot it. IOW, I would not invest time in this. Jason