Re: [RFC] scripts: Include postprocessing script for memory allocation tracing

From: Michal Hocko <mhocko@kernel.org>
Date: 2016-09-12 12:16:39
Also in: lkml

Hi,
I am sorry I didn't follow up on the previous submission. I find this
_really_ helpful. It is great that you could build on top of existing
tracepoints but one thing is not entirely clear to me. Without a begin
marker in __alloc_pages_nodemask we cannot really tell how long the
whole allocation took, which would be extremely useful. Or do you use
any graph tracer tricks to deduce that? There is a note in your
changelog but I cannot seem to find that in the changelog. And FWIW I
would be open to adding a tracepoint like that. It would make our life
so much easier...

On Sun 11-09-16 18:24:12, Janani Ravichandran wrote:
[...]

allocation_postprocess.py is a script which reads from trace_pipe. It
does the following to filter out info from tracepoints that may not
be important:

1. Displays mm_vmscan_direct_reclaim_begin and
mm_vmscan_direct_reclaim_end only when try_to_free_pages has
exceeded the threshold.
2. Displays mm_compaction_begin and mm_compaction_end only when
compact_zone has exceeded the threshold.
3. Displays mm_compaction_try_to_compat_pages only when
try_to_compact_pages has exceeded the threshold.
4. Displays mm_shrink_slab_start and mm_shrink_slab_end only when
the time elapsed between them exceeds the threshold.
5. Displays mm_vmscan_lru_shrink_inactive only when shrink_inactive_list
has exceeded the threshold.

When CTRL+C is pressed, the script shows the times taken by the
shrinkers. However, currently it is not possible to differentiate among
the
superblock shrinkers.

Sample output:
^Ci915_gem_shrinker_scan : total time = 8.731000 ms, max latency =
0.278000 ms
ext4_es_scan : total time = 0.970000 ms, max latency = 0.129000 ms
scan_shadow_nodes : total time = 1.150000 ms, max latency = 0.175000 ms
super_cache_scan : total time = 8.455000 ms, max latency = 0.466000 ms
deferred_split_scan : total time = 25.767000 ms, max latency = 25.485000
ms

Would it be possible to group those per the context? I mean a single
allocation/per-process drop down values rather than mixing all those
values together? For example if I see that a process is talling due to
direct reclaim I would love to see what is the worst case allocation
stall and what is the most probable source of that stall. Mixing kswapd
traces would be misleading here.

Thanks!
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help