Thread (19 messages) 19 messages, 6 authors, 2020-02-19

Re: BLKSECDISCARD ioctl and hung tasks

From: Salman Qazi <hidden>
Date: 2020-02-13 01:20:51
Also in: lkml

On Wed, Feb 12, 2020 at 3:07 PM Theodore Y. Ts'o [off-list ref] wrote:
This is a problem we've been strugging with in other contexts.  For
example, if you have the hung task timer set to 2 minutes, and the
system to panic if the hung task timer exceeds that, and an NFS server
which the client is writing to crashes, and it takes longer for the
NFS server to come back, that might be a situation where we might want
to exempt the hung task warning from panic'ing the system.  On the
other hand, if the process is failing to schedule for other reasons,
maybe we would still want the hung task timeout to go off.

So I've been meditating over whether the right answer is to just
globally configure the hung task timer to something like 5 or 10
minutes (which would require no kernel changes, yay?), or have some
way of telling the hung task timeout logic that it shouldn't apply, or
should have a different timeout, when we're waiting for I/O to
complete.
The problem that I anticipate in our space is that a generous timeout
will make impatient people reboot their chromebooks, losing us
information
about hangs.  But, this can be worked around by having multiple
different timeouts.  For instance, a thread that is expecting to do
something slow, can set a flag
to indicate that it wishes to be held against the more generous
criteria.  This is something I am tempted to do on older kernels where
we might not feel
comfortable backporting io_uring.
It seems to me that perhaps there's a different solution here for your
specific case, which is what if there is a asynchronous version of
BLKSECDISCARD, either using io_uring or some other interface?  That
bypasses the whole issue of how do we modulate the hung task timeout
when it's a situation where maybe it's OK for a userspace thread to
block for more than 120 seconds, without having to either completely
disable the hung task timeout, or changing it globally to some much
larger value.
This is worth evaluating.

Thanks,

Salman
                                        - Ted
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help