Re: [RFC 1/2] MD: raid5 trim support

From: Shaohua Li <shli@kernel.org>
Date: 2012-05-08 10:16:53

On Wed, Apr 25, 2012 at 11:43:07AM +0800, Shaohua Li wrote:

On Wed, Apr 18, 2012 at 02:34:04PM +0800, Shaohua Li wrote:

quoted

On 4/18/12 1:57 PM, NeilBrown wrote:

quoted

On Wed, 18 Apr 2012 13:30:45 +0800 Shaohua Li[off-list ref]  wrote:

quoted

On 4/18/12 12:48 PM, NeilBrown wrote:

quoted

On Wed, 18 Apr 2012 08:58:14 +0800 Shaohua Li[off-list ref]   wrote:

quoted

On 4/18/12 4:26 AM, NeilBrown wrote:

quoted

On Tue, 17 Apr 2012 07:46:03 -0700 Dan Williams[off-list ref]
wrote:

quoted

On Tue, Apr 17, 2012 at 1:35 AM, Shaohua Li[off-list ref]    wrote:

quoted

Discard for raid4/5/6 has limitation. If discard request size is small, we do
discard for one disk, but we need calculate parity and write parity disk.  To
correctly calculate parity, zero_after_discard must be guaranteed.

I'm wondering if we could use the new bad blocks facility to mark
discarded ranges so we don't necessarily need determinate data after
discard.

...but I have not looked into it beyond that.

--
Dan

No.

The bad blocks framework can only store a limited number of bad ranges - 512
in the current implementation.
That would not be an acceptable restriction for discarded ranges.

You would need a bitmap of some sort if you wanted to record discarded
regions.

http://neil.brown.name/blog/20110216044002#5

This appears to remove the unnecessary resync for discarded range after
a crash
or discard error, eg an enhancement. From my understanding, it can't
remove the
limitation I mentioned in the patch. For raid5, we still need discard a
whole
stripe (discarding one disk but writing parity disk isn't good).

It is certainly not ideal, but it is worse than not discarding at all?
And would updating some sort of bitmap be just as bad as updating the parity
block?

How about treating a DISCARD request as a request to write a block full of
zeros, then at the lower level treat any request to write a block full of
zeros as a DISCARD request.  So when the parity becomes zero, it gets
discarded.

Certainly it is best if the filesystem would discard whole stripes at a time,
and we should be sure to optimise that.  But maybe there is still room to do
something useful with small discards?

Sure, it would be great we can do small discards. But I didn't get how to do
it with the bitmap approach. Let's give an example, data disk1, data disk2,
parity disk3. Say discard some sectors of disk1. The suggested approach is
to mark the range bad. Then how to deal with parity disk3? As I said,
writing
parity disk3 isn't good. So mark the corresponding range of parity disk3
bad too? If we did this, if disk2 is broken, how can we restore it?

Why, exactly, is writing the parity disk not good?
Not discarding blocks that we possibly could discard is also not good.
Which is worst?

Writing the parity disk is worse. Discard is to improve the garbage
collection
of SSD firmware, so improve later write performance. While write is bad for
SSD, because SSD can be wear leveling out with extra write and also write
increases garbage collection overhead. So the result of small
discard is data
disk garbage collection is improved but parity disk gets worse and
parity disk
gets fast to end of its life, which doesn't make sense. This is even
worse when
the parity is distributed.

Neil,
Any comments about the patches?

ping!

Thanks,
Shaohua

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help