Re: [RFC 7/11] virtio_pci: new, capability-aware driver.
From: Rusty Russell <hidden>
Date: 2012-01-13 03:22:28
On Thu, 12 Jan 2012 08:00:10 +0200, "Michael S. Tsirkin" [off-list ref] wrote:
On Thu, Jan 12, 2012 at 12:31:09PM +1030, Rusty Russell wrote:quoted
If we use a 32-bit counter, we also get this though, right? If counter has changed, it's a config interrupt...But we need an exit to read the counter. We can put the counter in memory but this looks suspiciously like a simplified VQ - so why not use a VQ then?
Because now a driver first gets the data from config space. But from then on, they have to get it from the vq, and ignore the config space. That's a bit weird.
quoted
quoted
If we do require config VQ anyway, why not use it to notify guest of config changes? Guest could pre-post an in buffer and host uses that.We could, but it's an additional burden on each device. vqs are cheap, but not free. And the config area is so damn convenient...Not if you start playing with counters, checking it twice, reinvent all kind of barriers ...
None of that appears inside the driver, though. And let's be honest,
it's not *that* bad (very approx code):
static u32 vp_get_gen(struct virtio_pci_device *vp_dev)
{
u32 gen;
do {
gen = ioread32(vp_dev->ioaddr + VIRTIO_PCI_CONFIG_GEN);
} while (unlikely((gen & 1) == 1));
virtio_rmb();
return gen;
}
static bool vp_check_gen(struct virtio_pci_device *vp_dev, u32 gen)
{
virtio_rmb();
return ioread32(vp_dev->ioaddr + VIRTIO_PCI_CONFIG_GEN) == gen;
}
static void vp_get32(struct virtio_device *vdev, unsigned offset, u32 *buf)
{
struct virtio_pci_device *vp_dev = to_vp_device(vdev);
u32 gen;
do {
gen = vp_get_gen(vdev);
*buf = ioread32(vp_dev->ioaddr + VIRTIO_PCI_CONFIG(vp_dev) + offset);
} while (unlikely(!vp_check_gen(vp_dev, gen)));
}
...
quoted
It was suggested by others, but I think TCP Acks are the classic one. 12 + 14 + 20 + 40 = 86 bytes with virtio_net_hdr_mrg_rxbuf at the front.That's only the simplest IPv4, right? Anyway, this spans multiple descriptors so this complicates allocation significantly.
Yes, I think any general-but-useful inline will need to span multiple descriptors. That's part of the fun! Let's get totally crazy and implement our ring in stripes, like: 00 04 08 12 01 05 09 13 02 06 10 14 03 07 11 15 That way consecutive (32-byte) descriptors don't share a cacheline! (Not serious... quiet.)
quoted
Yes, I'm thinking #define VIRTIO_F_VIRTIO2 (-1). For PCI, this gets mapped into a "are we using the new config layout?". For others, it gets mapped into a transport-specific feature. (I'm sure you get it, but for the others) This is because I want to be draw a clear line between all the legacy stuff at the same time, not have to support part of it later because someone might not flip the feature bit.So my point is, config stuff and ring are completely separate, they are different layers.
Absolutely, and we should analyze them separately as well as together. *But* for maintenance is far easier if we only have to test new config+new ring and old config+old ring. They do interact, because remember, the allocation of the ring changes with new config, too... Cheers, Rusty.