Thread (3 messages) 3 messages, 2 authors, 2021-02-18

Re: Issue with cyclictest, RT_GROUP_SCHED, isolcpus and NOHZ_FULL

From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Date: 2021-02-18 15:59:56

On 2020-12-30 14:09:19 [+0100], Jonathan Schwender wrote:
Hi everyone,

I've been trying to test the real-time `performance` possible with
containers, by running cyclictest in a container on an RT-Kernel.
The issue I've been having does not require containers or an
RT kernel though.

Issue: cyclictest freezes after running for a few seconds
to minutes. After that only the loadavg section is updated,
while the count line does not change anymore.
cyclictest can't be killed after that point
other than by restarting the machine, and
this also takes a few minutes until the kernel kills
cyclictest.

This behaviour only occurs when the following conditions are
met:

- RT_GROUP_SCHED is used
- cyclictest is bound to an isolated cpu core with
  nohz_full=<core>, and isolcpus=nohz,domain,<core>
So if you remove RT_GROUP_SCHED and use cyclictest on the nohz_full
cores then everything is fine?
I've tested this on a machine with Fedora 33 and vanilla
stable 5.10.3 kernel with RT_GROUP_SCHED.
The same behaviour also exists on 5.10.1-rt20 with
PREEMPT_RT and RT_GROUP_SCHED configured.

After booting I configure the rt_runtime_us like this:
`echo "700000" > /sys/fs/cgroup/cpu,cpuacct/user.slice/cpu.rt_runtime_us`
`echo "100000" > /sys/fs/cgroup/cpu,cpuacct/system.slice/cpu.rt_runtime_us`

Then I start cyclictest via:
`taskset -c 14 cgexec -g cpu,cpuacct:user.slice cyclictest --mlockall \
  --priority=96 --interval=200 --affinity=14 --duration=15m`

These are the cmdline options I tried out to narrow the problem down:
working: `isolcpus=14 irqaffinity=0-3 maxcpus=15
systemd.unified_cgroup_hierarchy=0`
working: `isolcpus=nohz,14 nohz_full=14 irqaffinity=0-3 maxcpus=15
systemd.unified_cgroup_hierarchy=0`
working: `isolcpus=nohz,domain,14 irqaffinity=0-3 maxcpus=15
systemd.unified_cgroup_hierarchy=0`
broken:  `isolcpus=nohz,domain,14 nohz_full=14 irqaffinity=0-3 maxcpus=15
systemd.unified_cgroup_hierarchy=0`

unified_cgroup_hierarchy is needed to get cgroups v1, which
seems to be needed for RT_GROUP_SCHED (at least I couldn't
find any options similar to cpu.rt_runtime_us with the default
cgroup v2).
Basically it boils down to that the combination of the
domain parameter to isolcpus and nohz_full together with
RT_GROUP_SCHED cause the problem I'm observing.

Does anyone have any idea what could be causing this?
Am I doing something wrong, or is there an issue with cyclictest or
even the kernel that's causing this?

My motivation is running (testing) a real-time container on isolated
cores, so I think I do need all the kernel parameters I used above to
get good latencies.
You might want to try without nohz_full. My understanding is that this
used if your application remains mostly in userland (and uses no
syscalls, etc.).

Let me this on my list of things to try out.
Regards,

Jonathan Schwender
Sebastian
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help