Re: [md PATCH 2/5] md: Enable reshape for external metadata

From: Neil Brown <hidden>
Date: 2010-06-17 10:35:19

On Thu, 17 Jun 2010 10:40:36 +0100
"Trela, Maciej" [off-list ref] wrote:

quoted

Another thing is waiting during reshape for metadata update on

MD_CHANGE_DEVS flag.

quoted

To roll reshape I've added the following code (instead calling

md_ubdate_sb()):

Yes, there is a real issue there...

I don't think we ever need the kernel to wait for an external metadata
handler
to respond to device changes (apart from failure which is handled
separately).
So maybe the best thing is to guard all settings of MD_CHANGE_DEVS with
if (mddev->persistent)

I think that would be best, but I've make a note to review that later.

Neil,
from what I see in the raid5.c/md.c "native" code uses MD_CHANGE_DEVS
during the reshape if it reaches special points when metadata
write is really needed to update the reshape checkpoint.
In reshape_request():
	/* Cannot proceed until we've updated the superblock */
	..
	set_bit(MD_CHANGE_DEVS, mddev->flags)

In md_check_recovery() we have:
	if (mddev->flags) 
		md_update_sb()

Couldn't we follow this logic with MD_CHANGE_DEVS for external metadata?
If not, how to detect the need for migration checkpoint update?

Good question.
The first question to ask is
  How does mdmon know when a metadata update is required, and how does
  it tell md that the metadata update is complete.

OK, 2 first questions...

For the first I suspect it should watch 'md/reshape_position' (which need to
use sysfs_notify for).
For the second .... I don't know.
- Maybe sync_action could change to 'paused' and mdmon writes 'continue'....
  but that is possibly overloading that file too much.
- We could have a new sysfs file which just shows paused/active ??
- We could require that mdmon sets 'sync_max' appropriately so that reshape
  will stop at the right place, and then when mdmon has updated the metadata,
  it sets a new sync_max value.
- As above, but if sync_max is set too high, it is automatically reduced
  to the place when raid5 finds that it has to stop

I think the last one is probably best.
Before updating ->reshape_position, raid5 checks ->resync_max and if it is
too high for safety it set is lower to a safer value.
Then it changes ->reshape_position and calls sysfs_notify.

mdmon watches for 'reshape_postion' to change.  when it does it updates the
metadata and then writes a larger value to ->resync_max.

Things can get a little confusing when reshaping to fewer devices as
reshape_position decreases, but sync_completed always increases and sync_max
is still an 'upper' limit.

But it should work OK.

Does that seem reasonable?

NeilBrown

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help