Re: Suspicious test failure - mdmon misses recovery events on loop devices
From: NeilBrown <hidden>
Date: 2013-07-30 00:42:06
Attachments
- signature.asc [application/pgp-signature] 828 bytes
From: NeilBrown <hidden>
Date: 2013-07-30 00:42:06
On Mon, 29 Jul 2013 22:42:25 +0200 Martin Wilck [off-list ref] wrote:
quoted
My current idea to solve this is yet another separate thread just for monitoring kernel state changes. Don't have it ready yet, though.Another idea would be in manage_member, after queueing the metadata update and waking up the monitor, to wait for the metadata to finish processing before actually starting the recovery (writing "recover" to sync_action). Martin
I hope an extra thread won't be necessary :-) I think that manage_member is the place to fix this. However it might be even simpler than you suggest. We currently have replace_array(container, a, newa); sysfs_set_str(&a->info, NULL, "sync_action", "recover"); monitor subsequently takes that 'newa', looks at 'sync_action', see that it is 'idle' and assume that the recover never happened. Suppose we change it to: if (sysfs_set_str(&a->info, NULL, "sync_action", "recover") == 0) newa->prev_action = newa->curr_action = recovery; replace_array(container, a, newa); Then it wouldn't matter if monitor never saw the 'recovery' state as manager explicitly told it that recovery had started. Could you try that? Thanks, NeilBrown