Thread (22 messages) 22 messages, 2 authors, 2013-07-31

Re: Suspicious test failure - mdmon misses recovery events on loop devices

From: Martin Wilck <hidden>
Date: 2013-07-30 21:16:43

On 07/30/2013 02:42 AM, NeilBrown wrote:
On Mon, 29 Jul 2013 22:42:25 +0200 Martin Wilck [off-list ref] wrote:
quoted
quoted
My current idea to solve this is yet another separate thread just for
monitoring kernel state changes. Don't have it ready yet, though.
Another idea would be in manage_member, after queueing the metadata
update and waking up the monitor, to wait for the metadata to finish
processing before actually starting the recovery (writing "recover" to
sync_action).

Martin
I hope an extra thread won't be necessary :-)
I think the general problem that mdmon may be busy writing to disk while
something changes in the kernel is real. But introducing an extra thread
would make things even more complex as they are now, so it might be
something to avoid.
I think that manage_member is the place to fix this.  However it might be
even simpler than you suggest.

We currently have

		replace_array(container, a, newa);
		sysfs_set_str(&a->info, NULL, "sync_action", "recover");

monitor subsequently takes that 'newa', looks at 'sync_action', see that it
is 'idle' and assume that the recover never happened.
Suppose we change it to:

		if (sysfs_set_str(&a->info, NULL, "sync_action", "recover") == 0)
		        newa->prev_action = newa->curr_action = recovery;
		replace_array(container, a, newa);

Then it wouldn't matter if monitor never saw the 'recovery' state as manager
explicitly told it that recovery had started.

Could you try that?
Ingenious idea :-) Unfortunately it isn't sufficient. However this PLUS
waiting for the metadata upate to finish makes my test succeed reliably
(10/10, in the previous failure scenario). I'll send in the current
status of patches in a minute.

Martin
Thanks,
NeilBrown
  
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help