Re: [RFC] sched/eevdf: sched feature to dismiss lag on wakeup

From: Luis Machado <hidden>
Date: 2024-03-08 15:12:06
Also in: lkml

Hi Tobias,

On 2/28/24 16:10, Tobias Huschle wrote:

The previously used CFS scheduler gave tasks that were woken up an
enhanced chance to see runtime immediately by deducting a certain value
from its vruntime on runqueue placement during wakeup.

This property was used by some, at least vhost, to ensure, that certain
kworkers are scheduled immediately after being woken up. The EEVDF
scheduler, does not support this so far. Instead, if such a woken up
entitiy carries a negative lag from its previous execution, it will have
to wait for the current time slice to finish, which affects the
performance of the process expecting the immediate execution negatively.

To address this issue, implement EEVDF strategy #2 for rejoining
entities, which dismisses the lag from previous execution and allows
the woken up task to run immediately (if no other entities are deemed
to be preferred for scheduling by EEVDF).

The vruntime is decremented by an additional value of 1 to make sure,
that the woken up tasks gets to actually run. This is of course not
following strategy #2 in an exact manner but guarantees the expected
behavior for the scenario described above. Without the additional
decrement, the performance goes south even more. So there are some
side effects I could not get my head around yet.

Questions:
1. The kworker getting its negative lag occurs in the following scenario
   - kworker and a cgroup are supposed to execute on the same CPU
   - one task within the cgroup is executing and wakes up the kworker
   - kworker with 0 lag, gets picked immediately and finishes its
     execution within ~5000ns
   - on dequeue, kworker gets assigned a negative lag
   Is this expected behavior? With this short execution time, I would
   expect the kworker to be fine.

That strikes me as a bit odd as well. Have you been able to determine how a negative lag
is assigned to the kworker after such a short runtime?

I was looking at a different thread (https://lore.kernel.org/lkml/20240226082349.302363-1-yu.c.chen@intel.com/ (local)) that
uncovers a potential overflow in the eligibility calculation. Though I don't think that is the case for this particular
vhost problem.

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help