Thread (12 messages) 12 messages, 3 authors, 2017-03-07

Re: [PATCH 3/3] md/raid5: sort bios

From: Shaohua Li <shli@kernel.org>
Date: 2017-03-03 17:59:09

On Fri, Mar 03, 2017 at 02:43:49PM +1100, Neil Brown wrote:
On Fri, Feb 17 2017, Shaohua Li wrote:
quoted
Previous patch (raid5: only dispatch IO from raid5d for harddisk raid)
defers IO dispatching. The goal is to create better IO pattern. At that
time, we don't sort the deffered IO and hope the block layer can do IO
merge and sort. Now the raid5-cache writeback could create large amount
of bios. And if we enable muti-thread for stripe handling, we can't
control when to dispatch IO to raid disks. In a lot of time, we are
dispatching IO which block layer can't do merge effectively.

This patch moves further for the IO dispatching defer. We accumulate
bios, but we don't dispatch all the bios after a threshold is met. This
'dispatch partial portion of bios' stragety allows bios coming in a
large time window are sent to disks together. At the dispatching time,
there is large chance the block layer can merge the bios. To make this
more effective, we dispatch IO in ascending order. This increases
request merge chance and reduces disk seek.
I can see the benefit of batching and sorting requests.

I wonder if the extra complexity of grouping together 512 requests, then
submitting the "first" 128 is really worth it.  Have you measured the
value of that?
I'm pretty sure I tried. The whole point of dispatching the first 128 is we
don't have a better pipeline. Grouping 512 and then dispatching them together
definitely improve the IO patter, but the request accumulation takes time, we
will have no IO running in the window.
If you just submitted every time you got 512 requests, you could use
list_sort() on the bio list and wouldn't need an array.

If an array really is best, it would be really nice if "sort" could pass
a 'void*' down to the cmp function,
and it could sort all bios that are
*after* last_bio_pos first, and then the others.  That would make the
code much simpler.  I guess sort() could be changed (list_sort() already
has a 'priv' argument like this).
Ok, I'll change this to a list. And add extra pointer to record the last sorted
entry. I didn't see the sort uses much time in my profile, but the merge sort
looks better. Will do the change.
If we cannot change sort(), then maybe use lib/bsearch.c for the binary
search.  Performing two comparisons in the loop of a binary search
should get a *fail* in any algorithms class!!
The "pending_data" array that you have added to the r5conf structure
adds 4096 bytes.  This means it is larger than a page, which is best
avoided (though it is unlikely to cause problems).  I would allocate it
separately.
Yep, already fixed internally.
So there is a lot that I don't really like, but it seems like a good
idea in principle.
ok, thanks for your time!

Thanks,
Shaohua
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help