Thread (9 messages) 9 messages, 3 authors, 2017-02-13

Re: [PATCH] virtio: Try to untangle DMA coherency

From: "Michael S. Tsirkin" <mst@redhat.com>
Date: 2017-02-09 18:49:41
Also in: linux-arm-kernel, linux-devicetree

Possibly related (same subject, not in this thread)

On Thu, Feb 09, 2017 at 06:31:18PM +0000, Will Deacon wrote:
On Thu, Feb 09, 2017 at 08:17:16PM +0200, Michael S. Tsirkin wrote:
quoted
On Thu, Feb 02, 2017 at 04:40:49PM +0000, Will Deacon wrote:
quoted
On Thu, Feb 02, 2017 at 06:30:28PM +0200, Michael S. Tsirkin wrote:
quoted
I am inclined to say, for 4.10 let's revert
c7070619f3408d9a0dffbed9149e6f00479cf43b since what it fixes is not a
regression in 4.10.
No complaints there, as long as we can keep working to fix this for 4.11
and onwards. You'll also need to cc stable on the revert.
quoted
So I think we can defer the fix to 4.11.
I think we still want f7f6634d23830ff74335734fbdb28ea109c1f349
for hosts with virtio 1 support.

All this will hopefully push hosts to just implement virtio 1.
For mmio the changes are very small: several new registers,
that's all. You want this for proper 64 bit dma mask anyway.
As I've said, virtio 1 will have exactly the same issue unless we start
requiring firmware to advertise dma-coherent/_CCA for virtio-mmio
devices correctly.
OK I read up on _CCA in ACPI spec. It says:
The _CCA object returns whether or not a bus-master device supports
hardware managed cache coherency. Expected values are 0 to indicate it
is not supported, and 1 to indicate that it is supported.

So if host is cache coherent, and guest thinks it isn't, we incur
unnecessary overhead by wasting coherent memory.
I get that but you said it actually breaks - why does it?
It breaks because QEMU doesn't set _CCA for virtio-mmio devices, and that
only becomes a problem when we use the DMA API, because that results in the
guest taking out a non-cacheable mapping. On ARM (and other archs such as
Power), having a mismatch between a cacheable and a non-cacheable mapping
can result in a loss of coherency between the two (for example, if the
non-cacheable gues accesses bypass the cache, but the cacheable host
accesses allocate in the cache).

Will
I see. And I guess using a cacheable mapping is significantly faster.
I would say we want to typically use cacheable for virtio then,
whether we bypass the IOMMU or not. I guess this is why we always set
_CCA/DT correctly, right?

-- 
MST
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help