Re: [PATCH bpf-next v2 8/8] bpf: add a selftest for cgroup hierarchical... | netdev

[PATCH bpf-next v2 0/8] bpf: rstat: cgroup hierarchical stats · Yosry Ahmed <hidden> · 2022-06-10
[PATCH bpf-next v2 1/8] cgroup: enable cgroup_get_from_file() on cgroup1 · Yosry Ahmed <hidden> · 2022-06-10
[PATCH bpf-next v2 2/8] cgroup: Add cgroup_put() in !CONFIG_CGROUPS case · Yosry Ahmed <hidden> · 2022-06-10
[PATCH bpf-next v2 3/8] bpf, iter: Fix the condition on p when calling stop. · Yosry Ahmed <hidden> · 2022-06-10
Re: [PATCH bpf-next v2 3/8] bpf, iter: Fix the condition on p when calling stop. · Yonghong Song <hidden> · 2022-06-20
Re: [PATCH bpf-next v2 3/8] bpf, iter: Fix the condition on p when calling stop. · Hao Luo <hidden> · 2022-06-21
Re: [PATCH bpf-next v2 3/8] bpf, iter: Fix the condition on p when calling stop. · Yonghong Song <hidden> · 2022-06-24
Re: [PATCH bpf-next v2 3/8] bpf, iter: Fix the condition on p when calling stop. · Yosry Ahmed <hidden> · 2022-06-24
[PATCH bpf-next v2 4/8] bpf: Introduce cgroup iter · Yosry Ahmed <hidden> · 2022-06-10
Re: [PATCH bpf-next v2 4/8] bpf: Introduce cgroup iter · kernel test robot <hidden> · 2022-06-11
Re: [PATCH bpf-next v2 4/8] bpf: Introduce cgroup iter · kernel test robot <hidden> · 2022-06-11
Re: [PATCH bpf-next v2 4/8] bpf: Introduce cgroup iter · kernel test robot <hidden> · 2022-06-11
Re: [PATCH bpf-next v2 4/8] bpf: Introduce cgroup iter · kernel test robot <hidden> · 2022-06-11
Re: [PATCH bpf-next v2 4/8] bpf: Introduce cgroup iter · Yonghong Song <hidden> · 2022-06-28
Re: [PATCH bpf-next v2 4/8] bpf: Introduce cgroup iter · Yosry Ahmed <hidden> · 2022-06-28
Re: [PATCH bpf-next v2 4/8] bpf: Introduce cgroup iter · Hao Luo <hidden> · 2022-07-07
Re: [PATCH bpf-next v2 4/8] bpf: Introduce cgroup iter · Yonghong Song <hidden> · 2022-06-28
Re: [PATCH bpf-next v2 4/8] bpf: Introduce cgroup iter · Yosry Ahmed <hidden> · 2022-06-28
Re: [PATCH bpf-next v2 4/8] bpf: Introduce cgroup iter · Yonghong Song <hidden> · 2022-06-28
Re: [PATCH bpf-next v2 4/8] bpf: Introduce cgroup iter · Hao Luo <hidden> · 2022-07-07
[PATCH bpf-next v2 7/8] selftests/bpf: extend cgroup helpers · Yosry Ahmed <hidden> · 2022-06-10
[PATCH bpf-next v2 6/8] cgroup: bpf: enable bpf programs to integrate with rstat · Yosry Ahmed <hidden> · 2022-06-10
Re: [PATCH bpf-next v2 6/8] cgroup: bpf: enable bpf programs to integrate with rstat · kernel test robot <hidden> · 2022-06-10
Re: [PATCH bpf-next v2 6/8] cgroup: bpf: enable bpf programs to integrate with rstat · kernel test robot <hidden> · 2022-06-10
Re: [PATCH bpf-next v2 6/8] cgroup: bpf: enable bpf programs to integrate with rstat · Yosry Ahmed <hidden> · 2022-06-10
Re: [PATCH bpf-next v2 6/8] cgroup: bpf: enable bpf programs to integrate with rstat · Alexei Starovoitov <hidden> · 2022-06-11
Re: [PATCH bpf-next v2 6/8] cgroup: bpf: enable bpf programs to integrate with rstat · Yosry Ahmed <hidden> · 2022-06-13
Re: [PATCH bpf-next v2 6/8] cgroup: bpf: enable bpf programs to integrate with rstat · kernel test robot <hidden> · 2022-06-11
Re: [PATCH bpf-next v2 6/8] cgroup: bpf: enable bpf programs to integrate with rstat · Yonghong Song <hidden> · 2022-06-28
[PATCH bpf-next v2 8/8] bpf: add a selftest for cgroup hierarchical stats collection · Yosry Ahmed <hidden> · 2022-06-10
Re: [PATCH bpf-next v2 8/8] bpf: add a selftest for cgroup hierarchical stats collection · Yonghong Song <hidden> · 2022-06-28
Re: [PATCH bpf-next v2 8/8] bpf: add a selftest for cgroup hierarchical stats collection · Yosry Ahmed <hidden> · 2022-06-28
Re: [PATCH bpf-next v2 8/8] bpf: add a selftest for cgroup hierarchical stats collection · Yosry Ahmed <hidden> · 2022-06-28
Re: [PATCH bpf-next v2 8/8] bpf: add a selftest for cgroup hierarchical stats collection · Yosry Ahmed <hidden> · 2022-06-29
Re: [PATCH bpf-next v2 8/8] bpf: add a selftest for cgroup hierarchical stats collection · Yonghong Song <hidden> · 2022-06-29
Re: [PATCH bpf-next v2 8/8] bpf: add a selftest for cgroup hierarchical stats collection · Yosry Ahmed <hidden> · 2022-06-29
Re: [PATCH bpf-next v2 8/8] bpf: add a selftest for cgroup hierarchical stats collection · Yonghong Song <hidden> · 2022-07-02
Re: [PATCH bpf-next v2 8/8] bpf: add a selftest for cgroup hierarchical stats collection · Yosry Ahmed <hidden> · 2022-07-06
Re: [PATCH bpf-next v2 8/8] bpf: add a selftest for cgroup hierarchical stats collection · Yonghong Song <hidden> · 2022-06-29
Re: [PATCH bpf-next v2 8/8] bpf: add a selftest for cgroup hierarchical stats collection · Yosry Ahmed <hidden> · 2022-06-28
Re: [PATCH bpf-next v2 8/8] bpf: add a selftest for cgroup hierarchical stats collection · Yonghong Song <hidden> · 2022-06-29
Re: [PATCH bpf-next v2 8/8] bpf: add a selftest for cgroup hierarchical stats collection · Yosry Ahmed <hidden> · 2022-06-29
Re: [PATCH bpf-next v2 8/8] bpf: add a selftest for cgroup hierarchical stats collection · Hao Luo <hidden> · 2022-07-01
[PATCH bpf-next v2 5/8] selftests/bpf: Test cgroup_iter. · Yosry Ahmed <hidden> · 2022-06-10
Re: [PATCH bpf-next v2 5/8] selftests/bpf: Test cgroup_iter. · Yonghong Song <hidden> · 2022-06-28
Re: [PATCH bpf-next v2 0/8] bpf: rstat: cgroup hierarchical stats · Yosry Ahmed <hidden> · 2022-06-10

Re: [PATCH bpf-next v2 8/8] bpf: add a selftest for cgroup hierarchical stats collection

From: Yonghong Song <hidden>
Date: 2022-06-29 06:17:47
Also in: bpf, cgroups, lkml


On 6/28/22 12:14 AM, Yosry Ahmed wrote:

On Mon, Jun 27, 2022 at 11:47 PM Yosry Ahmed [off-list ref] wrote:

quoted

On Mon, Jun 27, 2022 at 11:14 PM Yonghong Song [off-list ref] wrote:

quoted



On 6/10/22 12:44 PM, Yosry Ahmed wrote:

quoted

Add a selftest that tests the whole workflow for collecting,
aggregating (flushing), and displaying cgroup hierarchical stats.

TL;DR:
- Whenever reclaim happens, vmscan_start and vmscan_end update
    per-cgroup percpu readings, and tell rstat which (cgroup, cpu) pairs
    have updates.
- When userspace tries to read the stats, vmscan_dump calls rstat to flush
    the stats, and outputs the stats in text format to userspace (similar
    to cgroupfs stats).
- rstat calls vmscan_flush once for every (cgroup, cpu) pair that has
    updates, vmscan_flush aggregates cpu readings and propagates updates
    to parents.

Detailed explanation:
- The test loads tracing bpf programs, vmscan_start and vmscan_end, to
    measure the latency of cgroup reclaim. Per-cgroup ratings are stored in
    percpu maps for efficiency. When a cgroup reading is updated on a cpu,
    cgroup_rstat_updated(cgroup, cpu) is called to add the cgroup to the
    rstat updated tree on that cpu.

- A cgroup_iter program, vmscan_dump, is loaded and pinned to a file, for
    each cgroup. Reading this file invokes the program, which calls
    cgroup_rstat_flush(cgroup) to ask rstat to propagate the updates for all
    cpus and cgroups that have updates in this cgroup's subtree. Afterwards,
    the stats are exposed to the user. vmscan_dump returns 1 to terminate
    iteration early, so that we only expose stats for one cgroup per read.

- An ftrace program, vmscan_flush, is also loaded and attached to
    bpf_rstat_flush. When rstat flushing is ongoing, vmscan_flush is invoked
    once for each (cgroup, cpu) pair that has updates. cgroups are popped
    from the rstat tree in a bottom-up fashion, so calls will always be
    made for cgroups that have updates before their parents. The program
    aggregates percpu readings to a total per-cgroup reading, and also
    propagates them to the parent cgroup. After rstat flushing is over, all
    cgroups will have correct updated hierarchical readings (including all
    cpus and all their descendants).

Signed-off-by: Yosry Ahmed <redacted>

There are a selftest failure with test:

get_cgroup_vmscan_delay:PASS:output format 0 nsec
get_cgroup_vmscan_delay:PASS:cgroup_id 0 nsec
get_cgroup_vmscan_delay:PASS:vmscan_reading 0 nsec
get_cgroup_vmscan_delay:PASS:read cgroup_iter 0 nsec
get_cgroup_vmscan_delay:PASS:output format 0 nsec
get_cgroup_vmscan_delay:PASS:cgroup_id 0 nsec
get_cgroup_vmscan_delay:FAIL:vmscan_reading unexpected vmscan_reading:
actual 0 <= expected 0
check_vmscan_stats:FAIL:child1_vmscan unexpected child1_vmscan: actual
781874 != expected 382092
check_vmscan_stats:FAIL:child2_vmscan unexpected child2_vmscan: actual
-1 != expected -2
check_vmscan_stats:FAIL:test_vmscan unexpected test_vmscan: actual
781874 != expected 781873
check_vmscan_stats:FAIL:root_vmscan unexpected root_vmscan: actual 0 <
expected 781874
destroy_progs:PASS:remove cgroup_iter pin 0 nsec
destroy_progs:PASS:remove cgroup_iter pin 0 nsec
destroy_progs:PASS:remove cgroup_iter pin 0 nsec
destroy_progs:PASS:remove cgroup_iter pin 0 nsec
destroy_progs:PASS:remove cgroup_iter pin 0 nsec
destroy_progs:PASS:remove cgroup_iter pin 0 nsec
destroy_progs:PASS:remove cgroup_iter pin 0 nsec
destroy_progs:PASS:remove cgroup_iter root pin 0 nsec
cleanup_bpffs:PASS:rmdir /sys/fs/bpf/vmscan/ 0 nsec
#33      cgroup_hierarchical_stats:FAIL

The test is passing on my setup. I am trying to figure out if there is
something outside the setup done by the test that can cause the test
to fail.

I can't reproduce the failure on my machine. It seems like for some
reason reclaim is not invoked in one of the test cgroups which results
in the expected stats not being there. I have a few suspicions as to
what might cause this but I am not sure.

If you have the capacity, do you mind re-running the test with the
attached diff1.patch? (and maybe diff2.patch if that fails, this will
cause OOMs in the test cgroup, you might see some process killed
warnings).

The patch doesn't help. Still failed.

get_cgroup_vmscan_delay:PASS:cgroup_id 0 nsec
get_cgroup_vmscan_delay:FAIL:vmscan_reading unexpected vmscan_reading: 
actual 0 <= expected 0
check_vmscan_stats:FAIL:child1_vmscan unexpected child1_vmscan: actual 
676612 != expected 339142
check_vmscan_stats:FAIL:child2_vmscan unexpected child2_vmscan: actual 
-1 != expected -2
check_vmscan_stats:FAIL:test_vmscan unexpected test_vmscan: actual 
676612 != expected 676611
check_vmscan_stats:FAIL:root_vmscan unexpected root_vmscan: actual 0 < 
expected 676612
destroy_progs:PASS:remove cgroup_iter pin 0 nsec
destroy_progs:PASS:remove cgroup_iter pin 0 nsec

Thanks!

quoted

Also an existing test also failed.

[...]

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help