Re: [PATCH v5 7/8] md/r5cache: r5c recovery

From: Song Liu <hidden>
Date: 2016-10-21 21:12:58

On Oct 19, 2016, at 8:03 PM, NeilBrown [off-list ref] wrote:

On Thu, Oct 13 2016, Song Liu wrote:

quoted

This is the recovery part of raid5-cache.

With cache feature, there are 2 different scenarios of recovery:
1. Data-Parity stripe: a stripe with complete parity in journal.
2. Data-Only stripe: a stripe with only data in journal (or partial
  parity).

The code differentiate Data-Parity stripe from Data-Only stripe with
flag (STRIPE_R5C_WRITTEN).

For Data-Parity stripes, we use the same procedure as raid5 journal,
where all the data and parity are replayed to the RAID devices.

For Data-Only strips, we need to finish complete calculate parity and
finish the full reconstruct write or RMW write. For simplicity, in
the recovery, we load the stripe to stripe cache. Once the array is
started, the stripe cache state machine will handle these stripes
through normal write path.

r5c_recovery_flush_log contains the main procedure of recovery. The
recovery code first scans through the journal and loads data to
stripe cache. The code keeps tracks of all these stripes in a list
(use sh->lru and ctx->cached_list), stripes in the list are
organized in the order of its first appearance on the journal.
During the scan, the recovery code assesses each stripe as
Data-Parity or Data-Only.

During scan, the array may run out of stripe cache. In these cases,
the recovery code will also call raid5_set_cache_size to increase
stripe cache size.

At the end of scan, the recovery code replays all Data-Parity
stripes, and sets proper states for Data-Only stripes. The recovery
code also increases seq number by 10 and rewrites all Data-Only
stripes to journal. This is to avoid confusion after repeated
crashes. More details is explained in raid5-cache.c before
r5c_recovery_rewrite_data_only_stripes().

Signed-off-by: Song Liu <redacted>


This patch seems to do a number of different things.
I think it re-factors the journal reading code.
It adds code to write a new "empty" journal metadata block
and it adds support for recovery of cached data.

All this together makes it hard to review.  I'd rather more smaller
patches.

I will try to split the patch into meaningful smaller patches.

quoted

+	/* stripes only have parity are already flushed to RAID */
+	if (data_count == 0)
+		goto out;

Can you explain why that is?  When were they flushed to the RAID, and
how was the parity determined?

It happens like this: say two stripes on journal: 100 and 200. The data (D)
and parity (P) pages are store in journal as:

      --->  D100   D200   P100  P200 ---->  newer data

Before we flush D100, journal_start points as D100. Then we flush D100, 
and new journal_start points as D200. Now the system fails, so next 
recovery starts from D200. Recovery code will find stripe 100 only has
parity. This means, stripe 100 is already flushed to raid. so we can ignore it.

quoted

+/* returns 0 for match; 1 for mismtach */

No, please don't do that.
You can return an negative error if you like, and call it as
  function_name() < 0
or
  function_name() == 0

or give the function a name that describes what it tests
i.e.
  r5l_data_checksum_is_correct()
or
  r5l_data_checksum_does_not_match()

and then return 0 if the test fails and 1 if it succeeds.

quoted

+static int
+r5l_recovery_verify_data_checksum(struct r5l_log *log, struct page *page,
+				  sector_t log_offset, __le32 log_checksum)

I will fix this. 

Thanks,
Song

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help