Re: [RFC V2] virtio: Add platform specific DMA API translation for virito devices
From: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Date: 2018-06-11 03:36:19
Also in:
lkml, virtualization
On Mon, 2018-06-11 at 06:28 +0300, Michael S. Tsirkin wrote:
quoted
However if the administrator ignores/forgets/deliberatey-decides/is-constrained to NOT enable the flag, virtio will not be able to pass control to the DMA ops associated with the virtio devices. Which means, we have no opportunity to share the I/O buffers with the hypervisor/qemu. How do you suggest, we handle this case?As step 1, ignore it as a user error.
Ugh ... not again. Ram, don't bring that subject back we ALREADY addressed it, and requiring the *user* to do special things is just utterly and completely wrong. The *user* has no bloody idea what that stuff is, will never know to set whatver magic qemu flag etc... The user will just start a a VM normally and expect things to work. Requiring the *user* to know things like that iommu virtio flag is complete nonsense. If by "user" you mean libvirt, then you are now requesting about 4 or 5 different projects to be patched to add speical cases for something they know nothing about and is completely irrelevant, while it can be entirely addressed with a 1-liner in virtio kernel side to allow the arch to plumb alternate DMA ops. So for some reason you seem to be dead set on a path that leads to mountain of user pain, changes to many different projects and overall havok while there is a much much simpler and elegant solution at hand which I described (again) in the response to Ram I sent about 5mn ago.
Further you can for example add per-device quirks in virtio so it can be switched to dma api. make extra decisions in platform code then.quoted
quoted
quoted
Both in the flag naming and the implementation there is an implication of DMA API == IOMMU, which is fundamentally wrong.Maybe we need to extend the meaning of PLATFORM_IOMMU or rename it. It's possible that some setups will benefit from a more fine-grained approach where some aspects of the DMA API are bypassed, others aren't. This seems to be what was being asked for in this thread, with comments claiming IOMMU flag adds too much overhead.quoted
The DMA API does a few different things: a) address translation This does include IOMMUs. But it also includes random offsets between PCI bars and system memory that we see on various platforms.I don't think you mean bars. That's unrelated to DMA.quoted
Worse so some of these offsets might be based on banks, e.g. on the broadcom bmips platform. It also deals with bitmask in physical addresses related to memory encryption like AMD SEV. I'd be really curious how for example the Intel virtio based NIC is going to work on any of those plaforms.SEV guys report that they just set the iommu flag and then it all works.This is one of the fundamental difference between SEV architecture and the ultravisor architecture. In SEV, qemu is aware of SEV. In ultravisor architecture, only the VM that runs within qemu is aware of ultravisor; hypervisor/qemu/administrator are untrusted entities.Spo one option is to teach qemu that it's on a platform with an ultravisor, this might have more advantages.quoted
I hope, we can make virtio subsystem flexibe enough to support various security paradigms.So if you are worried about qemu attacking guests, I see more problems than just passing an incorrect iommu flag.quoted
Apart from the above reason, Christoph and Ben point to so many other reasons to make it flexibe. So why not, make it happen?I don't see a flexibility argument. I just don't think new platforms should use workarounds that we put in place for old ones.quoted
quoted
I guess if there's translation we can think of this as a kind of iommu. Maybe we should rename PLATFORM_IOMMU to PLARTFORM_TRANSLATION? And apparently some people complain that just setting that flag makes qemu check translation on each access with an unacceptable performance overhead. Forcing same behaviour for everyone on general principles even without the flag is unlikely to make them happy.quoted
b) coherency On many architectures DMA is not cache coherent, and we need to invalidate and/or write back cache lines before doing DMA. Again, I wonder how this is every going to work with hardware based virtio implementations.You mean dma_Xmb and friends? There's a new feature VIRTIO_F_IO_BARRIER that's being proposed for that.quoted
Even worse I think this is actually broken at least for VIVT event for virtualized implementations. E.g. a KVM guest is going to access memory using different virtual addresses than qemu, vhost might throw in another different address space.I don't really know what VIVT is. Could you help me please?quoted
c) bounce buffering Many DMA implementations can not address all physical memory due to addressing limitations. In such cases we copy the DMA memory into a known addressable bounc buffer and DMA from there.Don't do it then?quoted
d) flushing write combining buffers or similar On some hardware platforms we need workarounds to e.g. read from a certain mmio address to make sure DMA can actually see memory written by the host.I guess it isn't an issue as long as WC isn't actually used. It will become an issue when virtio spec adds some WC capability - I suspect we can ignore this for now.quoted
All of this is bypassed by virtio by default despite generally being platform issues, not particular to a given device.It's both a device and a platform issue. A PV device is often more like another CPU than like a PCI device. -- MST-- Ram Pai