Re: Question about RAID1 plug/unplug code

From: NeilBrown <hidden>
Date: 2014-09-09 09:45:32

On Tue, 9 Sep 2014 11:33:13 +0300 Alexander Lyakas [off-list ref]
wrote:

Hi Neil,


On Tue, Sep 9, 2014 at 4:45 AM, NeilBrown [off-list ref] wrote:

quoted

On Mon, 8 Sep 2014 16:55:52 +0300 Alexander Lyakas [off-list ref]
wrote:

quoted

Hi Neil,
We have been seeing high latency on the md/raid1 block device, due to
the fact that all WRITEs are handed off to raid1d thread. This thread
also calls bitmap_unplug(), which writes the bitmap synchronously.
While it waits for the bitmap, it cannot trigger other WRITEs waiting
in its pending_bio_list. This is especially seen with SSDs: MD's
latency is much higher that SSD latency (I have been stoned by Peter
Grandi when I brought up this issue previously for raid5).

Then I have noticed the commit:

commit f54a9d0e59c4bea3db733921ca9147612a6f292c
Author: NeilBrown [off-list ref]
Date:   Thu Aug 2 08:33:20 2012 +1000

    md/raid1: submit IO from originating thread instead of md thread.

Looking at the code, I learned that to avoid switching into raid1d,
the caller has to use blk_start_plug/blk_finish_plug. So I added these
calls in our kernel module, which submits bios to MD. Results were
awesome, MD latency got down significantly.

That's good to hear.

quoted

So I have several questions about this plug/unplug thing.

1/ Originally this infrastructure was supposed to help IO schedulers
in merging requests. It is useful when one has a bunch of requests to
submit in one shot.

That is exactly the whole point of plugging:  allow the device to handle a
batch of requests together instead of one at a time.

quoted

But in MD case, thus infrastructure is used for a different purpose:
not to merge requests (which may help bandwidth, but probably not
latency), but to avoid making raid1d a bottleneck, to be able to
submit requests from multiple threads in parallel, which brings down
latency significantly in our case. Indeed "struct blk_plug" has a
special "cb_list", which is used only by MD.

I don't think the way md uses plugging is conceptually different from any
other use: it is always about gathering a batch together.
"cb_list" is handled by blk_check_plugged() which is also used by
block/umem.c and btrfs.

The base plugging code assumes that it is only gathering a batch of requests
for a single device - if the target device changes then the batch is flushed.
It also assumed that it was "struct request" that was batched.
Devices like md that want to queue 'struct bio', something else was needed.
Also with layered devices it can be useful to gather multiple batches for
multiple layers.
So I created "cb_list" etc and a more generic interface.

quoted

In my case I have only individual bios (not a bunch of bios), and I
after wrap them with plug/unplug, MD latency gets better. So we are
using the plug infrastructure for a different purpose.
Is my understanding correct? Was this your intention?

I don't really understand what you are doing.  There is no point in using
plugging for individual bios.  The  main point for raid1 writes is to gather
a lot of writes together so that all multiple bitmap bits can be set all at
once.
It should be possible to submit individual bios directly from make_request
without passing them to raid1d and without using plugging.

Can you pls explain how it is possible?
You have this code for WRITEs:
        cb = blk_check_plugged(raid1_unplug, mddev, sizeof(*plug));
        if (cb)
            plug = container_of(cb, struct raid1_plug_cb, cb);
        else
            plug = NULL;
        spin_lock_irqsave(&conf->device_lock, flags);
        if (plug) {
            bio_list_add(&plug->pending, mbio);
            plug->pending_cnt++;
        } else {
            bio_list_add(&conf->pending_bio_list, mbio);
            conf->pending_count++;
        }
        spin_unlock_irqrestore(&conf->device_lock, flags);

If the thread blk_check_plugged returns NULL, then you always hand the
WRITE to raid1d. So the only option to avoid handoff to raid1d is for
the caller to plug. Otherwise, all WRITEs are handed off to raid1d and
latency becomes terrible.
So in my case, I use plug/unplug for individual bios only to avoid the
handoff to raid1d.
What am I missing in this analysis?

if blk_check_plugged succeeds then it has arranged for raid1_unplug to be
called a little later by that same process.
So there is nothing to stop you calling raid1_unplug immediately.

raid1_unplug essentially does:
  bitmap_unplug()
  generic_make_request()

so you can very nearly just do that, without any plugging.

There is a bit of extra subtlety but I can't really know how relevant that
might be to you without actually seeing you code.

NeilBrown

Attachments

signature.asc [application/pgp-signature] 828 bytes

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help