Re: BLKSECDISCARD ioctl and hung tasks
From: Salman Qazi <hidden>
Date: 2020-02-13 01:20:51
Also in:
lkml
On Wed, Feb 12, 2020 at 3:07 PM Theodore Y. Ts'o [off-list ref] wrote:
This is a problem we've been strugging with in other contexts. For example, if you have the hung task timer set to 2 minutes, and the system to panic if the hung task timer exceeds that, and an NFS server which the client is writing to crashes, and it takes longer for the NFS server to come back, that might be a situation where we might want to exempt the hung task warning from panic'ing the system. On the other hand, if the process is failing to schedule for other reasons, maybe we would still want the hung task timeout to go off. So I've been meditating over whether the right answer is to just globally configure the hung task timer to something like 5 or 10 minutes (which would require no kernel changes, yay?), or have some way of telling the hung task timeout logic that it shouldn't apply, or should have a different timeout, when we're waiting for I/O to complete.
The problem that I anticipate in our space is that a generous timeout will make impatient people reboot their chromebooks, losing us information about hangs. But, this can be worked around by having multiple different timeouts. For instance, a thread that is expecting to do something slow, can set a flag to indicate that it wishes to be held against the more generous criteria. This is something I am tempted to do on older kernels where we might not feel comfortable backporting io_uring.
It seems to me that perhaps there's a different solution here for your specific case, which is what if there is a asynchronous version of BLKSECDISCARD, either using io_uring or some other interface? That bypasses the whole issue of how do we modulate the hung task timeout when it's a situation where maybe it's OK for a userspace thread to block for more than 120 seconds, without having to either completely disable the hung task timeout, or changing it globally to some much larger value.
This is worth evaluating. Thanks, Salman
- Ted