Thread (14 messages) 14 messages, 4 authors, 2019-08-20

Re: Bisected: Kernel 4.14 + has 3 times higher write IO latency than Kernel 4.4 with raid1

From: NeilBrown <hidden>
Date: 2019-08-05 23:46:41
Also in: lkml

On Mon, Aug 05 2019, Jinpu Wang wrote:
Hi Neil,

For the md higher write IO latency problem, I bisected it to these commits:

4ad23a97 MD: use per-cpu counter for writes_pending
210f7cd percpu-refcount: support synchronous switch to atomic mode.

Do you maybe have an idea? How can we fix it?
Hmmm.... not sure.

My guess is that the set_in_sync() call from md_check_recovery()
is taking a long time, and is being called too often.

Could you try two experiments please.

1/ set  /sys/block/md0/md/safe_mode_delay 
   to 20 or more.  It defaults to about 0.2.

2/ comment out the call the set_in_sync() in md_check_recovery().

Then run the least separately after each of these changes.

I the second one makes a difference, I'd like to know how often it gets
called - and why.  The test
	if ( ! (
		(mddev->sb_flags & ~ (1<<MD_SB_CHANGE_PENDING)) ||
		test_bit(MD_RECOVERY_NEEDED, &mddev->recovery) ||
		test_bit(MD_RECOVERY_DONE, &mddev->recovery) ||
		(mddev->external == 0 && mddev->safemode == 1) ||
		(mddev->safemode == 2
		 && !mddev->in_sync && mddev->recovery_cp == MaxSector)
		))
		return;

should normally return when doing lots of IO - I'd like to know
which condition causes it to not return.

Thanks,
NeilBrown

Attachments

Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help