Thread (1 message) 1 message, 1 author, 2025-10-27

Re: [PATCH v4 00/30] Live Update Orchestrator

From: Pratyush Yadav <pratyush@kernel.org>
Date: 2025-10-27 11:37:55
Also in: linux-doc, linux-fsdevel, linux-mm, lkml

On Mon, Oct 20 2025, Jason Gunthorpe wrote:
On Tue, Oct 14, 2025 at 03:29:59PM +0200, Pratyush Yadav wrote:
quoted
quoted
1) Use a vmalloc and store a list of the PFNs in the pool. Pool becomes
   frozen, can't add/remove PFNs.
Doesn't that circumvent LUO's state machine? The idea with the state
machine was to have clear points in time when the system goes into the
"limited capacity"/"frozen" state, which is the LIVEUPDATE_PREPARE
event. 
I wouldn't get too invested in the FSM, it is there but it doesn't
mean every luo client has to be focused on it.
Having each subsystem have its own state machine sounds like a bad idea
to me. It can get tricky to manage both for us and our users.
quoted
With what you propose, the first FD being preserved implicitly
triggers the prepare event. Same thing for unprepare/cancel operations.
Yes, this is easy to write and simple to manage.
quoted
I am wondering if it is better to do it the other way round: prepare all
files first, and then prepare the hugetlb subsystem at
LIVEUPDATE_PREPARE event. At that point it already knows which pages to
mark preserved so the serialization can be done in one go.
I think this would be slower and more complex?
quoted
quoted
2) Require the users of hugetlb memory, like memfd, to
   preserve/restore the folios they are using (using their hugetlb order)
3) Just before kexec run over the PFN list and mark a bit if the folio
   was preserved by KHO or not. Make sure everything gets KHO
   preserved.
"just before kexec" would need a callback from LUO. I suppose a
subsystem is the place for that callback. I wrote my email under the
(wrong) impression that we were replacing subsystems.
The file descriptors path should have luo client ops that have all
the required callbacks. This is probably an existing op.
quoted
That makes me wonder: how is the subsystem-level callback supposed to
access the global data? I suppose it can use the liveupdate_file_handler
directly, but it is kind of strange since technically the subsystem and
file handler are two different entities.
If we need such things we would need a way to link these together, but
I'm wonder if we really don't..
quoted
Also as Pasha mentioned, 1G pages for guest_memfd will use hugetlb, and
I'm not sure how that would map with this shared global data. memfd and
guest_memfd will likely have different liveupdate_file_handler but would
share data from the same subsystem. Maybe that's a problem to solve for
later...
On preserve memfd should call into hugetlb to activate it as a hugetlb
page provider and preserve it too.
From what I understand, the main problem you want to solve is that the
life cycle of the global data should be tied to the file descriptors.
And since everything should have a FD anyway, can't we directly tie the
subsystems to file handlers? The subsystem gets a "preserve" callback
when the first FD that uses it gets preserved. It gets a "unpreserve"
callback when the last FD goes away. And the rest of the state machine
like prepare, cancel, etc. stay the same.

I think this gives us a clean abstraction that has LUO-managed lifetime.

It also works with the guest_memfd and memfd case since both can have
hugetlb as their underlying subsystem. For example,

static const struct liveupdate_file_ops memfd_luo_file_ops = {
	.preserve = memfd_luo_preserve,
	.unpreserve = memfd_luo_unpreserve,
	[...]
	.subsystem = &luo_hugetlb_subsys,
};

And then luo_{un,}preserve_file() can keep a refcount for the subsystem
and preserve or unpreserve the subsystem as needed. LUO can manage the
locking for these callbacks too.

-- 
Regards,
Pratyush Yadav
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help