Thread (93 messages) 93 messages, 7 authors, 2013-05-30

Re: [RFC 7/11] virtio_pci: new, capability-aware driver.

From: Rusty Russell <hidden>
Date: 2012-01-13 03:22:28

On Thu, 12 Jan 2012 08:00:10 +0200, "Michael S. Tsirkin" [off-list ref] wrote:
On Thu, Jan 12, 2012 at 12:31:09PM +1030, Rusty Russell wrote:
quoted
If we use a 32-bit counter, we also get this though, right?

If counter has changed, it's a config interrupt...
But we need an exit to read the counter. We can put the counter
in memory but this looks suspiciously like a simplified VQ -
so why not use a VQ then?
Because now a driver first gets the data from config space.  But from
then on, they have to get it from the vq, and ignore the config space.
That's a bit weird.
quoted
quoted
If we do require config VQ anyway, why not use it to notify
guest of config changes? Guest could pre-post an in buffer
and host uses that.
We could, but it's an additional burden on each device.  vqs are cheap,
but not free.  And the config area is so damn convenient...
Not if you start playing with counters, checking it twice,
reinvent all kind of barriers ...
None of that appears inside the driver, though.  And let's be honest,
it's not *that* bad (very approx code):

static u32 vp_get_gen(struct virtio_pci_device *vp_dev)
{
        u32 gen;
        do {
                gen = ioread32(vp_dev->ioaddr + VIRTIO_PCI_CONFIG_GEN);
        } while (unlikely((gen & 1) == 1));

        virtio_rmb();
        return gen;
}

static bool vp_check_gen(struct virtio_pci_device *vp_dev, u32 gen)
{
        virtio_rmb();
        return ioread32(vp_dev->ioaddr + VIRTIO_PCI_CONFIG_GEN) == gen;
}

static void vp_get32(struct virtio_device *vdev, unsigned offset, u32 *buf)
{
	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
        u32 gen;

        do {
                gen = vp_get_gen(vdev);
                *buf = ioread32(vp_dev->ioaddr + VIRTIO_PCI_CONFIG(vp_dev) + offset);
        } while (unlikely(!vp_check_gen(vp_dev, gen)));
}
...
quoted
It was suggested by others, but I think TCP Acks are the classic one.
12 + 14 + 20 + 40 = 86 bytes with virtio_net_hdr_mrg_rxbuf at the front.
That's only the simplest IPv4, right?
Anyway, this spans multiple descriptors so this complicates allocation
significantly.
Yes, I think any general-but-useful inline will need to span multiple
descriptors.  That's part of the fun!

Let's get totally crazy and implement our ring in stripes, like:

00 04 08 12 01 05 09 13 02 06 10 14 03 07 11 15

That way consecutive (32-byte) descriptors don't share a cacheline!
(Not serious... quiet.)
quoted
Yes, I'm thinking #define VIRTIO_F_VIRTIO2 (-1).  For PCI, this gets
mapped into a "are we using the new config layout?".  For others, it
gets mapped into a transport-specific feature.

(I'm sure you get it, but for the others) This is because I want to be
draw a clear line between all the legacy stuff at the same time, not
have to support part of it later because someone might not flip the
feature bit.
So my point is, config stuff and ring are completely separate,
they are different layers.
Absolutely, and we should analyze them separately as well as together.

*But* for maintenance is far easier if we only have to test new
config+new ring and old config+old ring.  They do interact, because
remember, the allocation of the ring changes with new config, too...

Cheers,
Rusty.
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help