Re: [PATCH 5/9] raid5: log recovery

[PATCH 0/9]raid5: fix write hole · Shaohua Li <hidden> · 2015-07-30
[PATCH 1/9] MD: add a new disk role to present cache device · Shaohua Li <hidden> · 2015-07-30
Re: [PATCH 1/9] MD: add a new disk role to present cache device · Christoph Hellwig <hch@infradead.org> · 2015-08-04
RE: [PATCH 1/9] MD: add a new disk role to present cache device · Song Liu <hidden> · 2015-08-04
Re: [PATCH 1/9] MD: add a new disk role to present cache device · NeilBrown <hidden> · 2015-08-05
Re: [PATCH 1/9] MD: add a new disk role to present cache device · NeilBrown <hidden> · 2015-08-05
[PATCH 2/9] md: override md superblock recovery_offset for cache device · Shaohua Li <hidden> · 2015-07-30
Re: [PATCH 2/9] md: override md superblock recovery_offset for cache device · Christoph Hellwig <hch@infradead.org> · 2015-08-04
Re: [PATCH 2/9] md: override md superblock recovery_offset for cache device · NeilBrown <hidden> · 2015-08-05
[PATCH 3/9] raid5: add basic stripe log · Shaohua Li <hidden> · 2015-07-30
Re: [PATCH 3/9] raid5: add basic stripe log · NeilBrown <hidden> · 2015-08-05
Re: [PATCH 3/9] raid5: add basic stripe log · Shaohua Li <hidden> · 2015-08-05
Re: [PATCH 3/9] raid5: add basic stripe log · NeilBrown <hidden> · 2015-08-12
[PATCH 4/9] raid5: log reclaim support · Shaohua Li <hidden> · 2015-07-30
Re: [PATCH 4/9] raid5: log reclaim support · NeilBrown <hidden> · 2015-08-05
Re: [PATCH 4/9] raid5: log reclaim support · Shaohua Li <hidden> · 2015-08-05
Re: [PATCH 4/9] raid5: log reclaim support · NeilBrown <hidden> · 2015-08-12
Re: [PATCH 4/9] raid5: log reclaim support · NeilBrown <hidden> · 2015-08-05
[PATCH 5/9] raid5: log recovery · Shaohua Li <hidden> · 2015-07-30
Re: [PATCH 5/9] raid5: log recovery · NeilBrown <hidden> · 2015-08-05
Re: [PATCH 5/9] raid5: log recovery · Shaohua Li <hidden> · 2015-08-05
Re: [PATCH 5/9] raid5: log recovery · NeilBrown <hidden> · 2015-08-12
[PATCH 6/9] raid5: disable batch with log enabled · Shaohua Li <hidden> · 2015-07-30
[PATCH 7/9] raid5: don't allow resize/reshape with cache(log) support · Shaohua Li <hidden> · 2015-07-30
Re: [PATCH 7/9] raid5: don't allow resize/reshape with cache(log) support · NeilBrown <hidden> · 2015-08-05
Re: [PATCH 7/9] raid5: don't allow resize/reshape with cache(log) support · Shaohua Li <hidden> · 2015-08-05
Re: [PATCH 7/9] raid5: don't allow resize/reshape with cache(log) support · NeilBrown <hidden> · 2015-08-12
[PATCH 8/9] raid5: enable log for raid array with cache disk · Shaohua Li <hidden> · 2015-07-30
[PATCH 9/9] raid5: skip resync if cache(log) is enabled · Shaohua Li <hidden> · 2015-07-30
Re: [PATCH 9/9] raid5: skip resync if cache(log) is enabled · NeilBrown <hidden> · 2015-08-05

From: NeilBrown <hidden>
Date: 2015-08-12 03:51:18

On Wed, 5 Aug 2015 14:39:09 -0700 Shaohua Li [off-list ref] wrote:

On Wed, Aug 05, 2015 at 02:05:25PM +1000, NeilBrown wrote:

quoted

On Wed, 29 Jul 2015 17:38:45 -0700 Shaohua Li [off-list ref] wrote:

quoted

This is the log recovery support. The process is quite straightforward.
We scan the log and read all valid meta/data/parity into memory. If a
stripe's data/parity checksum is correct, the stripe will be recoveried.
Otherwise, it's discarded and we don't scan the log further. The reclaim
process guarantees stripe which starts to be flushed raid disks has
completed data/parity and has correct checksum. To recovery a stripe, we
just copy its data/parity to corresponding raid disks.

The trick thing is superblock update after recovery. we can't let
superblock point to last valid meta block. The log might look like:
| meta 1| meta 2| meta 3|
meta 1 is valid, meta 2 is invalid. meta 3 could be valid. If superblock
points to meta 1, we write a new valid meta 2n.  If crash happens again,
new recovery will start from meta 1. Since meta 2n is valid, recovery
will think meta 3 is valid, which is wrong.  The solution is we create a
new meta in meta2 with its seq == meta 1's seq + 2 and let superblock
points to meta2.  recovery will not think meta 3 is a valid meta,
because its seq is wrong

I like the idea of using a slightly larger 'seq' to avoid collisions -
except that I would probably feel safer with a much larger seq. May add
1024 or something (at least 10).

ok

quoted

TODO:
-recovery should run the stripe cache state machine in case of disk
breakage.

Why?

when you write to the log, you write all of the blocks that need
updating, whether they are destined for a failed device or not.

When you recover, you then have all the blocks that you might want to
write.  So write all the ones for which you have working devices, and
ignore the rest.

Did I miss something?

Not that I object, but if it works....

I mean the case of disk is broken. For example, log has a stripe with
data for disk 1, 2, 4. In recovery, disk 2 is broken. Just write 1, 4
isn't good. If we run the state machine, we can read disk 3 and have an
eventually consistent stripe.

But the log will have date for disk 1, 2, 4, and P and Q.
So if disk 2 is broken, we just write 1, 4, P, and Q and the data is
safe.

NeilBrown

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help