Re: [PATCH v4 00/30] Live Update Orchestrator
From: Pasha Tatashin <pasha.tatashin@soleen.com>
Date: 2025-10-09 22:42:47
Also in:
linux-doc, linux-fsdevel, linux-mm, lkml
On Thu, Oct 9, 2025 at 5:58 PM Samiullah Khawaja [off-list ref] wrote:
On Tue, Oct 7, 2025 at 10:11 AM Pasha Tatashin [off-list ref] wrote:quoted
On Sun, Sep 28, 2025 at 9:03 PM Pasha Tatashin [off-list ref] wrote:quoted
This series introduces the Live Update Orchestrator (LUO), a kernel subsystem designed to facilitate live kernel updates. LUO enables kexec-based reboots with minimal downtime, a critical capability for cloud environments where hypervisors must be updated without disrupting running virtual machines. By preserving the state of selected resources, such as file descriptors and memory, LUO allows workloads to resume seamlessly in the new kernel. The git branch for this series can be found at: https://github.com/googleprodkernel/linux-liveupdate/tree/luo/v4 The patch series applies against linux-next tag: next-20250926 While this series is showed cased using memfd preservation. There are works to preserve devices: 1. IOMMU: https://lore.kernel.org/all/20250928190624.3735830-16-skhawaja@google.com (local) 2. PCI: https://lore.kernel.org/all/20250916-luo-pci-v2-0-c494053c3c08@kernel.org (local) ======================================================================= Changelog since v3: (https://lore.kernel.org/all/20250807014442.3829950-1-pasha.tatashin@soleen.com (local)): - The main architectural change in this version is introduction of "sessions" to manage the lifecycle of preserved file descriptors. In v3, session management was left to a single userspace agent. This approach has been revised to improve robustness. Now, each session is represented by a file descriptor (/dev/liveupdate). The lifecycle of all preserved resources within a session is tied to this FD, ensuring automatic cleanup by the kernel if the controlling userspace agent crashes or exits unexpectedly. - The first three KHO fixes from the previous series have been merged into Linus' tree. - Various bug fixes and refactorings, including correcting memory unpreservation logic during a kho_abort() sequence. - Addressing all comments from reviewers. - Removing sysfs interface (/sys/kernel/liveupdate/state), the state can now be queried only via ioctl() API. =======================================================================Hi all, Following up on yesterday's Hypervisor Live Update meeting, we discussed the requirements for the LUO to track dependencies, particularly for IOMMU preservation and other stateful file descriptors. This email summarizes the main design decisions and outcomes from that discussion. For context, the notes from the previous meeting can be found here: https://lore.kernel.org/all/365acb25-4b25-86a2-10b0-1df98703e287@google.com (local) The notes for yesterday's meeting are not yes available. The key outcomes are as follows: 1. User-Enforced Ordering ------------------------- The responsibility for enforcing the correct order of operations will lie with the userspace agent. If fd_A is a dependency for fd_B, userspace must ensure that fd_A is preserved before fd_B. This same ordering must be honored during the restoration phase after the reboot (fd_A must be restored before fd_B). The kernel preserve the ordering. 2. Serialization in PRESERVE_FD ------------------------------- To keep the global prepare() phase lightweight and predictable, the consensus was to shift the heavy serialization work into the PRESERVE_FD ioctl handler. This means that when userspace requests to preserve a file, the file handler should perform the bulk of the state-saving work immediately. The proposed sequence of operations reflects this shift: Shutdown Flow: fd_preserve() (heavy serialization) -> prepare() (lightweight final checks) -> Suspend VM -> reboot(KEXEC) -> freeze() (lightweight) Boot & Restore Flow: fd_restore() (lightweight object creation) -> Resume VM -> Heavy post-restore IOCTLs (e.g., hardware page table re-creation) -> finish() (lightweight cleanup) This decision primarily serves as a guideline for file handler implementations. For the LUO core, this implies minor API changes, such as renaming can_preserve() to a more active preserve() and adding a corresponding unpreserve() callback to be called during UNPRESERVE_FD. 3. FD Data Query API -------------------- We identified the need for a kernel API to allow subsystems to query preserved FD data during the boot process, before userspace has initiated the restore. The proposed API would allow a file handler to retrieve a list of all its preserved FDs, including their session names, tokens, and the private data payload. Proposed Data Structure: struct liveupdate_fd { char *session; /* session name */ u64 token; /* Preserved FD token */ u64 data; /* Private preserved data */ }; Proposed Function: liveupdate_fd_data_query(struct liveupdate_file_handler *h, struct liveupdate_fd *fds, long *count);Now that you are adding the "File-Lifecycle-Bound Global State", I was wondering if this session data query mechanism is still necessary. It seems that any preserved state a file handler needs to restore during boot could be fetched using the Global data support instead. For example, I don't think session information will be needed to restore iommu domains during boot (iommu init), but even if some other file handler needs it then it can keep this info in global data. I discussed this briefly with Pasha today, but wanted to raise it here as well.
I agree, the query API is ugly and indeed not needed with the FLB Global State. The biggest problem with the query API is that the caller must somehow know how to interpret the preserved file-handler data before the struct file is reconstructed. This is problematic; there should only be one place that knows how to store and interpret the data, not multiple. It looks like the combination of an enforced ordering: Preservation: A->B->C->D Un-preservation: D->C->B->A Retrieval: A->B->C->D and the FLB Global State (where data is automatically created and destroyed when a particular file type participates in a live update) solves the need for this query mechanism. For example, the IOMMU driver/core can add its data only when an iommufd is preserved and add more data as more iommufds are added. The preserved data is also automatically removed once the live update is finished or canceled. Pasha