Thread (4 messages) 4 messages, 2 authors, 2015-09-27
STALE3908d
Revisions (12)
  1. rfc [diff vs current]
  2. rfc [diff vs current]
  3. rfc [diff vs current]
  4. rfc [diff vs current]
  5. rfc [diff vs current]
  6. rfc [diff vs current]
  7. rfc [diff vs current]
  8. rfc [diff vs current]
  9. rfc [diff vs current]
  10. rfc [diff vs current]
  11. rfc current
  12. rfc [diff vs current]

Re: [RFC PATCH 0/2] virtio nvme

From: Ming Lin <mlin@kernel.org>
Date: 2015-09-23 22:58:17
Also in: linux-nvme

Possibly related (same subject, not in this thread)

On Fri, 2015-09-18 at 14:09 -0700, Nicholas A. Bellinger wrote:
On Fri, 2015-09-18 at 11:12 -0700, Ming Lin wrote:
quoted
On Thu, 2015-09-17 at 17:55 -0700, Nicholas A. Bellinger wrote:
quoted
On Thu, 2015-09-17 at 16:31 -0700, Ming Lin wrote:
quoted
On Wed, 2015-09-16 at 23:10 -0700, Nicholas A. Bellinger wrote:
quoted
Hi Ming & Co,
<SNIP>
quoted
quoted
quoted
quoted
quoted
I think the future "LIO NVMe target" only speaks NVMe protocol.

Nick(CCed), could you correct me if I'm wrong?

For SCSI stack, we have:
virtio-scsi(guest)
tcm_vhost(or vhost_scsi, host)
LIO-scsi-target

For NVMe stack, we'll have similar components:
virtio-nvme(guest)
vhost_nvme(host)
LIO-NVMe-target
I think it's more interesting to consider a 'vhost style' driver that
can be used with unmodified nvme host OS drivers.

Dr. Hannes (CC'ed) had done something like this for megasas a few years
back using specialized QEMU emulation + eventfd based LIO fabric driver,
and got it working with Linux + MSFT guests.

Doing something similar for nvme would (potentially) be on par with
current virtio-scsi+vhost-scsi small-block performance for scsi-mq
guests, without the extra burden of a new command set specific virtio
driver.
Trying to understand it.
Is it like below?

  .------------------------.   MMIO   .---------------------------------------.
  | Guest                  |--------> | Qemu                                  |
  | Unmodified NVMe driver |<-------- | NVMe device simulation(eventfd based) |
  '------------------------'          '---------------------------------------'
                                                  |          ^
                                      write NVMe  |          |  notify command
                                      command     |          |  completion
                                      to eventfd  |          |  to eventfd
                                                  v          |
                                      .--------------------------------------.
                                      | Host:                                |
                                      | eventfd based LIO NVMe fabric driver |
                                      '--------------------------------------'
                                                        |
                                                        | nvme_queue_rq()
                                                        v
                                       .--------------------------------------.
                                       | NVMe driver                          |
                                       '--------------------------------------'
                                                        |
                                                        |
                                                        v
                                       .-------------------------------------.
                                       | NVMe device                         |
                                       '-------------------------------------'
Correct.  The LIO driver on KVM host would be handling some amount of
NVMe host interface emulation in kernel code, and would be able to
decode nvme Read/Write/Flush operations and translate -> submit to
existing backend drivers.
Let me call the "eventfd based LIO NVMe fabric driver" as
"tcm_eventfd_nvme"

Currently, LIO frontend driver(iscsi, fc, vhost-scsi etc) talk to LIO
backend driver(fileio, iblock etc) with SCSI commands.

Did you mean the "tcm_eventfd_nvme" driver need to translate NVMe
commands to SCSI commands and then submit to backend driver?
IBLOCK + FILEIO + RD_MCP don't speak SCSI, they simply process I/Os with
LBA + length based on SGL memory or pass along a FLUSH with LBA +
length.

So once the 'tcm_eventfd_nvme' driver on KVM host receives a nvme host
hardware frame via eventfd, it would decode the frame and send along the
Read/Write/Flush when exposing existing (non nvme native) backend
drivers.
Learned vhost architecture:
http://blog.vmsplice.net/2011/09/qemu-internals-vhost-architecture.html

The nice thing is it is not tied to KVM in any way.

For SCSI, there are "virtio-scsi" in guest kernel and "vhost-scsi" in
host kernel.

For NVMe, there is no "virtio-nvme" in guest kernel(just unmodified NVMe
driver), but I'll do similar thing in Qemu with vhost infrastructure.
And there is "vhost_nvme" in host kernel.

For the "virtqueue" implementation in qemu-nvme, I'll possibly just
use/copy drivers/virtio/virtio_ring.c, same as what
linux/tools/virtio/virtio_test.c does.

A bit more detail graph as below. What do you think?

.-----------------------------------------.           .------------------------.
| Guest(Linux, Windows, FreeBSD, Solaris) |  NVMe     | qemu                   |
| unmodified NVMe driver                  |  command  | NVMe device emulation  |
|                                         | ------->  | vhost + virtqueue      |
'-----------------------------------------'           '------------------------'
                                                          |           |      ^
                                            passthrough   |         kick/notify
                                            NVMe command  |         via eventfd
userspace                                   via virtqueue |           |      |
                                                          v           v      |
----------------------------------------------------------------------------------
       .-----------------------------------------------------------------------.
kernel | LIO frontend driver                                                   |
       | - vhost_nvme                                                          |
       '-----------------------------------------------------------------------'
                                  |  translate       ^
                                  |  (NVMe command)  |
                                  |  to              |
                                  v  (LBA, length)   |
       .----------------------------------------------------------------------.
       | LIO backend driver                                                   |
       | - fileio (/mnt/xxx.file)                                             |
       | - iblock (/dev/sda1, /dev/nvme0n1, ...)                              |
       '----------------------------------------------------------------------'
                                  |                 ^
                                  |  submit_bio()   |
                                  v                 |
       .----------------------------------------------------------------------.
       | block layer                                                          |
       |                                                                      |
       '----------------------------------------------------------------------'
                                  |                 ^
                                  |                 |
                                  v                 |
       .----------------------------------------------------------------------.
       | block device driver                                                  |
       |                                                                      |
       '----------------------------------------------------------------------'
              |                |                  |                 |
              |                |                  |                 |
              v                v                  v                 v
       .------------.    .-----------.     .------------.   .---------------.
       | SATA       |    | SCSI      |     | NVMe       |   | ....          |
       '------------'    '-----------'     '------------'   '---------------'
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help