Re: [PATCH v16 3/3] mm: Reduce latency of OOM killer task selection with 2-pass algorithm
From: Michal Hocko <mhocko@suse.com>
Date: 2026-01-26 17:47:09
Also in:
linux-mm, lkml
On Mon 26-01-26 11:39:33, Mathieu Desnoyers wrote:
On 2026-01-16 16:55, Michal Hocko wrote:quoted
On Wed 14-01-26 14:36:44, Mathieu Desnoyers wrote:quoted
On 2026-01-14 12:06, Michal Hocko wrote:quoted
On Wed 14-01-26 09:59:15, Mathieu Desnoyers wrote:[...] Thanks to those clarificationsquoted
quoted
My overall impression is that the implementation is really involved and at this moment I do not really see a big benefit of all the complexity.Note that we can get the proc ABI RSS accuracy improvements with the previous 2 patches without this 2-pass algo. Do you see more value in the RSS accuracy improvements than in the oom killer latency reduction ?Yes, TBH I do not see oom latency as a big problem. As already mention this is a slow path and we are not talking about a huge latency anyway. proc numbers are much more sensitive to latency as they are regularly read by user space tools and accuracy for those matters as well (being off by 100s MB or GBs is simply making those numbers completely bogus).It makes sense.quoted
quoted
quoted
It would help to explicitly mention what is the the overall imprecision of the oom victim selection with the new data structure (maybe this is good enough[*]). What if we go with exact precision with the new data structure comparing to the original pcp counters.Do you mean comparing using approximate sums with the new data structure (which has a bounded accuracy of O(nr_cpus*log(nr_cpus))) compared to the old data structure which had an inaccuracy of O(nr_cpus^2) ? So if the inaccuracy provided by the new data structure is good enough for OOM task selection, we could go from precise sum back to an approximation and just use that with the new data structure.Exactly!OK, so based on your feedback, I plan to remove this 2-pass algo from the series, and simply keep using the precise sum for the OOM killer. If people complain about its latency, then we can eventually use the approximation provided by the hierarchical counters. But let's wait until someone asks for it rather than add this complexity when there is no need. The hierarchical counters are still useful as they increase the accuracy of approximations exported through /proc. How does that sound ?
Works for me. Thanks! -- Michal Hocko SUSE Labs