Thread (35 messages) 35 messages, 5 authors, 2024-06-04

Re: [PATCH v4 2/7] mm: multi-gen LRU: Have secondary MMUs participate in aging

From: James Houghton <hidden>
Date: 2024-06-03 23:17:18
Also in: kvm, kvm-riscv, kvmarm, linux-arm-kernel, linux-doc, linux-kselftest, linux-mips, linux-mm, linux-riscv, lkml, loongarch

On Mon, Jun 3, 2024 at 4:03 PM Sean Christopherson [off-list ref] wrote:
On Mon, Jun 03, 2024, James Houghton wrote:
quoted
On Thu, May 30, 2024 at 11:06 PM Yu Zhao [off-list ref] wrote:
quoted
What I don't think is acceptable is simplifying those optimizations
out without documenting your justifications (I would even call it a
design change, rather than simplification, from v3 to v4).
I'll put back something similar to what you had before (like a
test_clear_young() with a "fast" parameter instead of "bitmap"). I
like the idea of having a new mmu notifier, like
fast_test_clear_young(), while leaving test_young() and clear_young()
unchanged (where "fast" means "prioritize speed over accuracy").
Those two statements are contradicting each other, aren't they?
I guess it depends on how you define "similar". :)
Anyways, I vote
for a "fast only" variant, e.g. test_clear_young_fast_only() or so.  gup() has
already established that terminology in mm/, so hopefully it would be familiar
to readers.  We could pass a param, but then the MGLRU code would likely end up
doing a bunch of useless indirect calls into secondary MMUs, whereas a dedicated
hook allows implementations to nullify the pointer if the API isn't supported
for whatever reason.

And pulling in Oliver's comments about locking, I think it's important that the
mmu_notifier API express it's requirement that the operation be "fast", not that
it be lockless.  E.g. if a secondary MMU can guarantee that a lock will be
contented only in rare, slow cases, then taking a lock is a-ok.  Or a secondary
MMU could do try-lock and bail if the lock is contended.

That way KVM can honor the intent of the API with an implementation that works
best for KVM _and_ for MGRLU.  I'm sure there will be future adjustments and fixes,
but that's just more motivation for using something like "fast only" instead of
"lockless".
Yes, thanks, this is exactly what I meant. I really should have "only"
in the name to signify that it is a requirement that it be fast.
Thanks for wording it so clearly.
quoted
quoted
quoted
I made this logic change as part of removing batching.

I'd really appreciate guidance on what the correct thing to do is.

In my mind, what would work great is: by default, do aging exactly
when KVM can do it locklessly, and then have a Kconfig to always have
MGLRU to do aging with KVM if a user really cares about proactive
reclaim (when the feature bit is set). The selftest can check the
Kconfig + feature bit to know for sure if aging will be done.
I still don't see how that Kconfig helps. Or why the new static branch
isn't enough?
Without a special Kconfig, the feature bit just tells us that aging
with KVM is possible, not that it will necessarily be done. For the
self-test, it'd be good to know exactly when aging is being done or
not, so having a Kconfig like LRU_GEN_ALWAYS_WALK_SECONDARY_MMU would
help make the self-test set the right expectations for aging.

The Kconfig would also allow a user to know that, no matter what,
we're going to get correct age data for VMs, even if, say, we're using
the shadow MMU.
Heh, unless KVM flushes, you won't get "correct" age data.
quoted
This is somewhat important for me/Google Cloud. Is that reasonable? Maybe
there's a better solution.
Hmm, no?  There's no reason to use a Kconfig, e.g. if we _really_ want to prioritize
accuracy over speed, then a KVM (x86?) module param to have KVM walk nested TDP
page tables would give us what we want.

But before we do that, I think we need to perform due dilegence (or provide data)
showing that having KVM take mmu_lock for write in the "fast only" API provides
better total behavior.  I.e. that the additional accuracy is indeed worth the cost.
That sounds good to me. I'll drop the Kconfig. I'm not really sure
what to do about the self-test, but that's not really all that
important.
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help