Thread (31 messages) 31 messages, 6 authors, 2019-07-23

Re: INFO: rcu detected stall in ext4_write_checks

From: Peter Zijlstra <peterz@infradead.org>
Date: 2019-07-15 13:40:11
Also in: linux-ext4, lkml

On Mon, Jul 15, 2019 at 06:01:01AM -0700, Paul E. McKenney wrote:
Title: Making SCHED_DEADLINE safe for kernel kthreads

Abstract:

Dmitry Vyukov's testing work identified some (ab)uses of sched_setattr()
that can result in SCHED_DEADLINE tasks starving RCU's kthreads for
extended time periods, not millisecond, not seconds, not minutes, not even
hours, but days. Given that RCU CPU stall warnings are issued whenever
an RCU grace period fails to complete within a few tens of seconds,
the system did not suffer silently. Although one could argue that people
should avoid abusing sched_setattr(), people are human and humans make
mistakes. Responding to simple mistakes with RCU CPU stall warnings is
all well and good, but a more severe case could OOM the system, which
is a particularly unhelpful error message.

It would be better if the system were capable of operating reasonably
despite such abuse. Several approaches have been suggested.

First, sched_setattr() could recognize parameter settings that put
kthreads at risk and refuse to honor those settings. This approach
of course requires that we identify precisely what combinations of
sched_setattr() parameters settings are risky, especially given that there
are likely to be parameter settings that are both risky and highly useful.
So we (the people poking at the DEADLINE code) are all aware of this,
and on the TODO list for making DEADLINE available for !priv users is
the item:

  - put limits on deadline/period

And note that that is both an upper and lower limit. The upper limit
you've just found why we need it, the lower limit is required because
you can DoS the hardware by causing deadlines/periods that are equal (or
shorter) than the time it takes to program the hardware.

There might have even been some patches that do some of this, but I've
held off because we have bigger problems and they would've established
an ABI while it wasn't clear it was sufficient or the right form.

Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help