Re: [RFC PATCH 00/22] Enhance VHOST to enable SoC-to-SoC communication
From: Cornelia Huck <cohuck@redhat.com>
Date: 2020-08-28 10:34:49
Also in:
kvm, linux-doc, linux-pci, linux-remoteproc, lkml, virtualization
On Thu, 9 Jul 2020 14:26:53 +0800 Jason Wang [off-list ref] wrote: [Let me note right at the beginning that I first noted this while listening to Kishon's talk at LPC on Wednesday. I might be very confused about the background here, so let me apologize beforehand for any confusion I might spread.]
On 2020/7/8 下午9:13, Kishon Vijay Abraham I wrote:quoted
Hi Jason, On 7/8/2020 4:52 PM, Jason Wang wrote:quoted
On 2020/7/7 下午10:45, Kishon Vijay Abraham I wrote:quoted
Hi Jason, On 7/7/2020 3:17 PM, Jason Wang wrote:quoted
On 2020/7/6 下午5:32, Kishon Vijay Abraham I wrote:quoted
Hi Jason, On 7/3/2020 12:46 PM, Jason Wang wrote:quoted
On 2020/7/2 下午9:35, Kishon Vijay Abraham I wrote:quoted
Hi Jason, On 7/2/2020 3:40 PM, Jason Wang wrote:quoted
On 2020/7/2 下午5:51, Michael S. Tsirkin wrote:quoted
On Thu, Jul 02, 2020 at 01:51:21PM +0530, Kishon Vijay Abraham I wrote:quoted
This series enhances Linux Vhost support to enable SoC-to-SoC communication over MMIO. This series enables rpmsg communication between two SoCs using both PCIe RC<->EP and HOST1-NTB-HOST2 1) Modify vhost to use standard Linux driver model 2) Add support in vring to access virtqueue over MMIO 3) Add vhost client driver for rpmsg 4) Add PCIe RC driver (uses virtio) and PCIe EP driver (uses vhost) for rpmsg communication between two SoCs connected to each other 5) Add NTB Virtio driver and NTB Vhost driver for rpmsg communication between two SoCs connected via NTB 6) Add configfs to configure the components UseCase1 : VHOST RPMSG VIRTIO RPMSG + + | | | | | | | | +-----v------+ +------v-------+ | Linux | | Linux | | Endpoint | | Root Complex | | <-----------------> | | | | | | SOC1 | | SOC2 | +------------+ +--------------+ UseCase 2: VHOST RPMSG VIRTIO RPMSG + + | | | | | | | | +------v------+ +------v------+ | | | | | HOST1 | | HOST2 | | | | | +------^------+ +------^------+ | | | | +---------------------------------------------------------------------+ | +------v------+ +------v------+ | | | | | | | | | EP | | EP | | | | CONTROLLER1 | | CONTROLLER2 | | | | <-----------------------------------> | | | | | | | | | | | | | | | | | SoC With Multiple EP Instances | | | | | | (Configured using NTB Function) | | | | +-------------+ +-------------+ | +---------------------------------------------------------------------+
First of all, to clarify the terminology: Is "vhost rpmsg" acting as what the virtio standard calls the 'device', and "virtio rpmsg" as the 'driver'? Or is the "vhost" part mostly just virtqueues + the exiting vhost interfaces?
quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
Software Layering: The high-level SW layering should look something like below. This series adds support only for RPMSG VHOST, however something similar should be done for net and scsi. With that any vhost device (PCI, NTB, Platform device, user) can use any of the vhost client driver. +----------------+ +-----------+ +------------+ +----------+ | RPMSG VHOST | | NET VHOST | | SCSI VHOST | | X | +-------^--------+ +-----^-----+ +-----^------+ +----^-----+ | | | | | | | | | | | | +-----------v-----------------v--------------v--------------v----------+ | VHOST CORE | +--------^---------------^--------------------^------------------^-----+ | | | | | | | | | | | | +--------v-------+ +----v------+ +----------v----------+ +----v-----+ | PCI EPF VHOST | | NTB VHOST | |PLATFORM DEVICE VHOST| | X | +----------------+ +-----------+ +---------------------+ +----------+
So, the upper half is basically various functionality types, e.g. a net device. What is the lower half, a hardware interface? Would it be equivalent to e.g. a normal PCI device?
quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
This was initially proposed here [1] [1] -> https://lore.kernel.org/r/2cf00ec4-1ed6-f66e-6897-006d1a5b6390@ti.com (local)I find this very interesting. A huge patchset so will take a bit to review, but I certainly plan to do that. Thanks!Yes, it would be better if there's a git branch for us to have a look.I've pushed the branch https://github.com/kishon/linux-wip.git vhost_rpmsg_pci_ntb_rfcThanksquoted
quoted
Btw, I'm not sure I get the big picture, but I vaguely feel some of the work is duplicated with vDPA (e.g the epf transport or vhost bus).This is about connecting two different HW systems both running Linux and doesn't necessarily involve virtualization.Right, this is something similar to VOP (Documentation/misc-devices/mic/mic_overview.rst). The different is the hardware I guess and VOP use userspace application to implement the device.I'd also like to point out, this series tries to have communication between two SoCs in vendor agnostic way. Since this series solves for 2 usecases (PCIe RC<->EP and NTB), for the NTB case it directly plugs into NTB framework and any of the HW in NTB below should be able to use a virtio-vhost communication #ls drivers/ntb/hw/ amd epf idt intel mscc And similarly for the PCIe RC<->EP communication, this adds a generic endpoint function driver and hence any SoC that supports configurable PCIe endpoint can use virtio-vhost communication # ls drivers/pci/controller/dwc/*ep* drivers/pci/controller/dwc/pcie-designware-ep.c drivers/pci/controller/dwc/pcie-uniphier-ep.c drivers/pci/controller/dwc/pci-layerscape-ep.cThanks for those backgrounds.quoted
quoted
quoted
So there is no guest or host as in virtualization but two entirely different systems connected via PCIe cable, one acting as guest and one as host. So one system will provide virtio functionality reserving memory for virtqueues and the other provides vhost functionality providing a way to access the virtqueues in virtio memory. One is source and the other is sink and there is no intermediate entity. (vhost was probably intermediate entity in virtualization?)(Not a native English speaker) but "vhost" could introduce some confusion for me since it was use for implementing virtio backend for userspace drivers. I guess "vringh" could be better.Initially I had named this vringh but later decided to choose vhost instead of vringh. vhost is still a virtio backend (not necessarily userspace) though it now resides in an entirely different system. Whatever virtio is for a frontend system, vhost can be that for a backend system. vring can be for accessing virtqueue and can be used either in frontend or backend.
I guess that clears up at least some of my questions from above...
quoted
quoted
quoted
quoted
Ok.quoted
quoted
quoted
quoted
Have you considered to implement these through vDPA?IIUC vDPA only provides an interface to userspace and an in-kernel rpmsg driver or vhost net driver is not provided. The HW connection looks something like https://pasteboard.co/JfMVVHC.jpg (usecase2 above),I see.quoted
all the boards run Linux. The middle board provides NTB functionality and board on either side provides virtio/vhost functionality and transfer data using rpmsg.
This setup looks really interesting (sometimes, it's really hard to imagine this in the abstract.)
quoted
quoted
quoted
quoted
quoted
quoted
So I wonder whether it's worthwhile for a new bus. Can we use the existed virtio-bus/drivers? It might work as, except for the epf transport, we can introduce a epf "vhost" transport driver.IMHO we'll need two buses one for frontend and other for backend because the two components can then co-operate/interact with each other to provide a functionality. Though both will seemingly provide similar callbacks, they are both provide symmetrical or complimentary funcitonality and need not be same or identical. Having the same bus can also create sequencing issues. If you look at virtio_dev_probe() of virtio_bus device_features = dev->config->get_features(dev); Now if we use same bus for both front-end and back-end, both will try to get_features when there has been no set_features. Ideally vhost device should be initialized first with the set of features it supports. Vhost and virtio should use "status" and "features" complimentarily and not identically.Yes, but there's no need for doing status/features passthrough in epf vhost drivers.bquoted
virtio device (or frontend) cannot be initialized before vhost device (or backend) gets initialized with data such as features. Similarly vhost (backend) cannot access virqueues or buffers before virtio (frontend) sets VIRTIO_CONFIG_S_DRIVER_OK whereas that requirement is not there for virtio as the physical memory for virtqueues are created by virtio (frontend).epf vhost drivers need to implement two devices: vhost(vringh) device and virtio device (which is a mediated device). The vhost(vringh) device is doing feature negotiation with the virtio device via RC/EP or NTB. The virtio device is doing feature negotiation with local virtio drivers. If there're feature mismatch, epf vhost drivers and do mediation between them.Here epf vhost should be initialized with a set of features for it to negotiate either as vhost device or virtio device no? Where should the initial feature set for epf vhost come from?I think it can work as: 1) Having an initial features (hard coded in the code) set X in epf vhost 2) Using this X for both virtio device and vhost(vringh) device 3) local virtio driver will negotiate with virtio device with feature set Y 4) remote virtio driver will negotiate with vringh device with feature set Z 5) mediate between feature Y and feature Z since both Y and Z are a subset of Xokay. I'm also thinking if we could have configfs for configuring this. Anyways we could find different approaches of configuring this.Yes, and I think some management API is needed even in the design of your "Software Layering". In that figure, rpmsg vhost need some pre-set or hard-coded features.
When I saw the plumbers talk, my first idea was "this needs to be a new transport". You have some hard-coded or pre-configured features, and then features are negotiated via a transport-specific means in the usual way. There's basically an extra/extended layer for this (and status, and whatever). Does that make any sense?
quoted
quoted
quoted
quoted
quoted
quoted
It will have virtqueues but only used for the communication between itself and uppter virtio driver. And it will have vringh queues which will be probe by virtio epf transport drivers. And it needs to do datacopy between virtqueue and vringh queues. It works like: virtio drivers <- virtqueue/virtio-bus -> epf vhost drivers <- vringh queue/epf> The advantages is that there's no need for writing new buses and drivers.I think this will work however there is an addtional copy between vringh queue and virtqueue,I think not? E.g in use case 1), if we stick to virtio bus, we will have: virtio-rpmsg (EP) <- virtio ring(1) -> epf vhost driver (EP) <- virtio ring(2) -> virtio pci (RC) <-> virtio rpmsg (RC)IIUC epf vhost driver (EP) will access virtio ring(2) using vringh?Yes.quoted
And virtio ring(2) is created by virtio pci (RC).Yes.quoted
quoted
What epf vhost driver did is to read from virtio ring(1) about the buffer len and addr and them DMA to Linux(RC)?okay, I made some optimization here where vhost-rpmsg using a helper writes a buffer from rpmsg's upper layer directly to remote Linux (RC) as against here were it has to be first written to virtio ring (1). Thinking how this would look for NTB virtio-rpmsg (HOST1) <- virtio ring(1) -> NTB(HOST1) <-> NTB(HOST2) <- virtio ring(2) -> virtio-rpmsg (HOST2) Here the NTB(HOST1) will access the virtio ring(2) using vringh?Yes, I think so it needs to use vring to access virtio ring (1) as well.NTB(HOST1) and virtio ring(1) will be in the same system. So it doesn't have to use vring. virtio ring(1) is by the virtio device the NTB(HOST1) creates.Right.quoted
quoted
quoted
Do you also think this will work seamlessly with virtio_net.c, virtio_blk.c?Yes.okay, I haven't looked at this but the backend of virtio_blk should access an actual storage device no?Good point, for non-peer device like storage. There's probably no need for it to be registered on the virtio bus and it might be better to behave as you proposed.
I might be missing something; but if you expose something as a block device, it should have something it can access with block reads/writes, shouldn't it? Of course, that can be a variety of things.
Just to make sure I understand the design, how is VHOST SCSI expected to work in your proposal, does it have a device for file as a backend?quoted
quoted
quoted
I'd like to get clarity on two things in the approach you suggested, one is features (since epf vhost should ideally be transparent to any virtio driver)We can have have an array of pre-defined features indexed by virtio device id in the code.quoted
and the other is how certain inputs to virtio device such as number of buffers be determined.We can start from hard coded the value like 256, or introduce some API for user to change the value.quoted
Thanks again for your suggestions!You're welcome. Note that I just want to check whether or not we can reuse the virtio bus/driver. It's something similar to what you proposed in Software Layering but we just replace "vhost core" with "virtio bus" and move the vhost core below epf/ntb/platform transport.Got it. My initial design was based on my understanding of your comments [1].Yes, but that's just for a networking device. If we want something more generic, it may require more thought (bus etc).
I believe that we indeed need something bus-like to be able to support a variety of devices.
quoted
I'll try to create something based on your proposed design here.Sure, but for coding, we'd better wait for other's opinion here.
Please tell me if my thoughts above make any sense... I have just started looking at that, so I might be completely off.