Re: [PATCH v16 3/3] mm: Reduce latency of OOM killer task selection with... | linux-trace-kernel

Re: [PATCH v16 3/3] mm: Reduce latency of OOM killer task selection with 2-pass algorithm

From: Michal Hocko <mhocko@suse.com>
Date: 2026-01-26 17:47:09
Also in: linux-mm, lkml

On Mon 26-01-26 11:39:33, Mathieu Desnoyers wrote:

On 2026-01-16 16:55, Michal Hocko wrote:

quoted

On Wed 14-01-26 14:36:44, Mathieu Desnoyers wrote:

quoted

On 2026-01-14 12:06, Michal Hocko wrote:

quoted

On Wed 14-01-26 09:59:15, Mathieu Desnoyers wrote:

[...]
Thanks to those clarifications

quoted

My overall impression is that the implementation is really involved and
at this moment I do not really see a big benefit of all the complexity.

Note that we can get the proc ABI RSS accuracy improvements with the
previous 2 patches without this 2-pass algo. Do you see more value in
the RSS accuracy improvements than in the oom killer latency reduction ?

Yes, TBH I do not see oom latency as a big problem. As already mention
this is a slow path and we are not talking about a huge latency anyway.
proc numbers are much more sensitive to latency as they are regularly
read by user space tools and accuracy for those matters as well (being
off by 100s MB or GBs is simply making those numbers completely bogus).

It makes sense.

quoted

It would help to explicitly mention what is the the overall imprecision
of the oom victim selection with the new data structure (maybe this is
good enough[*]). What if we go with exact precision with the new data
structure comparing to the original pcp counters.

Do you mean comparing using approximate sums with the new data
structure (which has a bounded accuracy of O(nr_cpus*log(nr_cpus)))
compared to the old data structure which had an inaccuracy of
O(nr_cpus^2) ? So if the inaccuracy provided by the new data structure
is good enough for OOM task selection, we could go from precise sum
back to an approximation and just use that with the new data
structure.

Exactly!

OK, so based on your feedback, I plan to remove this 2-pass algo
from the series, and simply keep using the precise sum for the OOM
killer. If people complain about its latency, then we can eventually
use the approximation provided by the hierarchical counters. But let's
wait until someone asks for it rather than add this complexity when
there is no need.

The hierarchical counters are still useful as they increase the
accuracy of approximations exported through /proc.

How does that sound ?

Works for me.

Thanks!
-- 
Michal Hocko
SUSE Labs

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help