Re: PREEMPT_RT_FULL breaks NO_HZ_FULL (full dynticks)
From: Ramesh Thomas <hidden>
Date: 2018-08-31 12:35:13
On 2018-08-30 at 16:18:56 +0200, Sebastian Andrzej Siewior wrote:
On 2018-08-26 20:39:22 [-0700], Ramesh Thomas wrote:quoted
Case #2 with CONFIG_PREEMPT_RT_FULL=y (First run after boot) S [cpuhp/3] S [migration/3] S [posixcputmr/3] S [rcuc/3] S [ktimersoftd/3] S [ksoftirqd/3] I [kworker/3:0-mm_] I [kworker/3:0H] R [irq/125-nvme0q4] R [kworker/3:1-mm_] R ./jitterirq/125 shouldn't be there, right?
Yes. Also posixcputmr and ktimersoftd are not seen if PREEMPT_RT_FULL is not enabled. They don't seem to be running when the timer interrupts occur. But they being there by itself indicates something different is happening. Sched RT_Prio Cpu_Time S [posixcputmr/3] FF 99 00:00:00 R [ktimersoftd/3] FF 1 00:00:00 S [irq/125-nvme0q4] FF 50 00:00:00 R ./jitter FF 99 00:01:53
quoted
Case #3 with CONFIG_PREEMPT_RT_FULL=y (Second run after boot) S [cpuhp/3] S [migration/3] S [posixcputmr/3] S [rcuc/3] R [ktimersoftd/3] S [ksoftirqd/3] I [kworker/3:0-mm_] I [kworker/3:0H] S [irq/125-nvme0q4] R [kworker/3:1-mm_] R ./jitter In Case #3, /proc/interupts show timer interrupts occuring on CPU 3 while it is stopped in the other cases. ktimersoftd is in runnable state in Case #3can you trace down who or what is arming the timer on CPU3?
Ok, I will take a look.
quoted
Is this a known issue and is it being looked at by anyone?now that I know of. Do you happen to know if this is a regression compared to v4.14-RT?
I see the issue in 4.14.63 RT as well. There are slight differences in behavior due to changes that went in 4.17, but the main issue is seen there also.
quoted
If it is an issue, I would be glad to help in any way to get these 2 very important features compatible with each other.So if the ktimersoftd runs and you see the interrupt counter incrementing for CPU3 then it would be interesting to figure out why there is an armed timer on the second invocation (and none on the first one).
In 4.18.5 RT, the issue does not always happen in the second invocation. Sometimes it works as expected, but the issue will show up after a few tries. Looks like there are 2 issues when PREEMPT_RT_FULL is enabled 1. Some additional processes are pinned to isolated cores. 2. Timer is armed even though only a single high priority task is running.
quoted
Thanks, RameshSebastian