Re: BLKSECDISCARD ioctl and hung tasks

From: Ming Lei <hidden>
Date: 2020-02-15 03:47:12
Also in: lkml

On Fri, Feb 14, 2020 at 11:42:32AM -0800, Salman Qazi wrote:

On Fri, Feb 14, 2020 at 1:23 AM Ming Lei [off-list ref] wrote:

quoted

On Fri, Feb 14, 2020 at 1:50 PM Bart Van Assche [off-list ref] wrote:

quoted

On 2020-02-13 11:21, Salman Qazi wrote:

quoted

AFAICT, This is not actually sufficient, because the issuer of the bio
is waiting for the entire bio, regardless of how it is split later.
But, also there isn't a good mapping between the size of the secure
discard and how long it will take.  If given the geometry of a flash
device, it is not hard to construct a scenario where a relatively
small secure discard (few thousand sectors) will take a very long time
(multiple seconds).

Having said that, I don't like neutering the hung task timer either.

Hi Salman,

How about modifying the block layer such that completions of bio
fragments are considered as task activity? I think that bio splitting is
rare enough for such a change not to affect performance of the hot path.

Are you sure that the task hung warning won't be triggered in case of
non-splitting?

I demonstrated a few emails ago that it doesn't take a very large
secure discard command to trigger this.  So, I am sceptical that we
will be able to use splitting to solve this.

quoted

How about setting max_discard_segments such that a discard always
completes in less than half the hung task timeout? This may make
discards a bit slower for one particular block driver but I think that's
better than hung task complaints.

I am afraid you can't find a golden setting max_discard_segments working
for every drivers. Even it is found, the performance  may have been affected.

So just wondering why not take the simple approach used in blk_execute_rq()?

My colleague Gwendal pointed out another issue which I had missed:
secure discard is an exclusive command: it monopolizes the device.
Even if we fix this via your approach, it will show up somewhere else,
because other operations to the drive will not make progress for that
length of time.

What are the 'other operations'? Are they block IOs?

If yes, that is why I suggest to fix submit_bio_wait(), which should cover
most of sync bio submission.

Anyway, the fix is simple & generic enough, I'd plan to post a formal
patch if no one figures out better doable approaches.

For Chromium OS purposes, if we had a blank slate, this is how I would solve it:

* Under the assumption that the truly sensitive data is not very big:
    * Keep secure data on a separate partition to make sure that those
LBAs have controlled history
    * Treat the files in that partition as immutable (i.e. no
overwriting the contents of the file without first secure erasing the
existing contents).
    * By never letting more than one version of the file accumulate,
we can guarantee that the secure erase will always be fast for
moderate sized files.

But for all the existing machines with keys on them, we will need to
do something else.

The issue you reported is a generic one, not Chromium only.


Thanks,
Ming

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help