Thread (32 messages) 32 messages, 5 authors, 2020-03-09

Re: [PATCH v1 3/3] virtio-balloon: Switch back to OOM handler for VIRTIO_BALLOON_F_DEFLATE_ON_OOM

From: Tyler Sanderson via Virtualization <hidden>
Date: 2020-02-14 20:49:03
Also in: linux-mm

Regarding Wei's patch that modifies the shrinker implementation, versus
this patch which reverts to OOM notifier:
I am in favor of both patches. But I do want to make sure a fix gets back
ported to 4.19 where the performance regression was first introduced.
My concern with reverting to the OOM notifier is, as mst@ put it (in the
other thread):
"when linux hits OOM all kind of error paths are being hit, latent bugs
start triggering, latency goes up drastically."
The guest could be in a lot of pain before the OOM notifier is invoked, and
it seems like the shrinker API might allow more fine grained control of
when we deflate.

On the other hand, I'm not totally convinced that Wei's patch is an
expected use of the shrinker/page-cache APIs, and maybe it is fragile.
Needs more testing and scrutiny.

It seems to me like the shrinker API is the right API in the long run,
perhaps with some fixes and modifications. But maybe reverting to OOM
notifier is the best patch to back port?

On Fri, Feb 14, 2020 at 6:19 AM David Hildenbrand [off-list ref] wrote:
quoted
quoted
There was a report that this results in undesired side effects when
inflating the balloon to shrink the page cache. [1]
     "When inflating the balloon against page cache (i.e. no free memory
      remains) vmscan.c will both shrink page cache, but also invoke the
      shrinkers -- including the balloon's shrinker. So the balloon
      driver allocates memory which requires reclaim, vmscan gets this
      memory by shrinking the balloon, and then the driver adds the
      memory back to the balloon. Basically a busy no-op."

The name "deflate on OOM" makes it pretty clear when deflation should
happen - after other approaches to reclaim memory failed, not while
reclaiming. This allows to minimize the footprint of a guest - memory
will only be taken out of the balloon when really needed.

Especially, a drop_slab() will result in the whole balloon getting
deflated - undesired.
Could you explain why some more? drop_caches shouldn't be really used in
any production workloads and if somebody really wants all the cache to
be dropped then why is balloon any different?
Deflation should happen when the guest is out of memory, not when
somebody thinks it's time to reclaim some memory. That's what the
feature promised from the beginning: Only give the guest more memory in
case it *really* needs more memory.

Deflate on oom, not deflate on reclaim/memory pressure. (that's what the
report was all about)

A priority for shrinkers might be a step into the right direction.

--
Thanks,

David / dhildenb
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help