Thread (22 messages) 22 messages, 2 authors, 2013-07-31

Re: Suspicious test failure - mdmon misses recovery events on loop devices

From: NeilBrown <hidden>
Date: 2013-07-30 00:42:06

On Mon, 29 Jul 2013 22:42:25 +0200 Martin Wilck [off-list ref] wrote:
quoted
My current idea to solve this is yet another separate thread just for
monitoring kernel state changes. Don't have it ready yet, though.
Another idea would be in manage_member, after queueing the metadata
update and waking up the monitor, to wait for the metadata to finish
processing before actually starting the recovery (writing "recover" to
sync_action).

Martin
I hope an extra thread won't be necessary :-)

I think that manage_member is the place to fix this.  However it might be
even simpler than you suggest.

We currently have

		replace_array(container, a, newa);
		sysfs_set_str(&a->info, NULL, "sync_action", "recover");

monitor subsequently takes that 'newa', looks at 'sync_action', see that it
is 'idle' and assume that the recover never happened.
Suppose we change it to:

		if (sysfs_set_str(&a->info, NULL, "sync_action", "recover") == 0)
		        newa->prev_action = newa->curr_action = recovery;
		replace_array(container, a, newa);

Then it wouldn't matter if monitor never saw the 'recovery' state as manager
explicitly told it that recovery had started.

Could you try that?

Thanks,
NeilBrown

Attachments

Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help