Thread (8 messages) 8 messages, 4 authors, 2021-08-03

Re: [PATCH v3] mm/oom_kill: show oom eligibility when displaying the current memory state of all tasks

From: Michal Hocko <mhocko@suse.com>
Date: 2021-08-02 06:34:33
Also in: lkml

On Fri 30-07-21 17:20:02, Aaron Tomlin wrote:
Changes since v2:
 - Use single character (e.g. 'R' for MMF_OOM_SKIP) as suggested
   by Tetsuo Handa [off-list ref]
 - Add new header to oom_dump_tasks documentation
 - Provide further justification


The output generated by dump_tasks() can be helpful to determine why
there was an OOM condition and which rogue task potentially caused it.
Please note that this is only provided when sysctl oom_dump_tasks is
enabled.

At the present time, when showing potential OOM victims, we do not
exclude any task that are not OOM eligible e.g.
Well, this is not precise. We do exclude ineligible. Consider tasks that
are outside of the OOM domain for example. You are right that we are not
excluding all of them though.
those that have
MMF_OOM_SKIP set; it is possible that the last OOM killable victim was
already OOM killed, yet the OOM reaper failed to reclaim memory and set
MMF_OOM_SKIP. This can be confusing (or perhaps even be misleading) to the
viewer. Now, we already unconditionally display a task's oom_score_adj_min
value that can be set to OOM_SCORE_ADJ_MIN which is indicative of an
"unkillable" task.

This patch provides a clear indication with regard to the OOM ineligibility
(and why) of each displayed task with the addition of a new column namely
"oom_skipped". An example is provided below:

    [ 5084.524970] [ pid ]   uid  tgid total_vm      rss pgtables_bytes swapents oom_score_adj oom_skipped name
    [ 5084.526397] [660417]     0 660417    35869      683   167936        0         -1000 M conmon
    [ 5084.526400] [660452]     0 660452   175834      472    86016        0          -998  pod
    [ 5084.527460] [752415]     0 752415    35869      650   172032        0         -1000 M conmon
    [ 5084.527462] [752575] 1001050000 752575   184205    11158   700416        0           999  npm
    [ 5084.527467] [753606] 1001050000 753606   183380    46843  2134016        0           999  node
    [ 5084.527581] Memory cgroup out of memory: Killed process 753606 (node) total-vm:733520kB, anon-rss:161228kB, file-rss:26144kB, shmem-rss:0kB, UID:1001050000

So, a single character 'M' is for OOM_SCORE_ADJ_MIN, 'R' MMF_OOM_SKIP and
'V' for in_vfork().
I have to say I dislike this for two reasons. First and foremost it
makes parsing unnecessarily more complex. Now you have a potentially
an empty column to special case. Secondly MMF_OOM_SKIP is an internal
state that shouldn't really leak to userspace IMO. in_vfork is a racy
check and M is already expressed via oom_score_adj so it duplicates an
existing information.

If you really want/need to make any change here then I would propose to
either add E(eligible)/I(ligible) column without any specifics or
consistently skip over all tasks which are not eligible.
2.31.1
-- 
Michal Hocko
SUSE Labs
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help