Re: [PATCH 2/3] [RFC] Changes for MQ virtio-net
From: Krishna Kumar2 <hidden>
Date: 2011-03-01 16:02:56
Also in:
kvm
"Michael S. Tsirkin" [off-list ref] wrote on 02/28/2011 03:13:20 PM: Thank you once again for your feedback on both these patches. I will send the qemu patch tomorrow. I will also send the next version incorporating these suggestions once we finalize some minor points.
Overall looks good. The numtxqs meaning the number of rx queues needs some cleanup. init/cleanup routines need more symmetry. Error handling on setup also seems slightly buggy or at least
asymmetrical.
Finally, this will use up a large number of MSI vectors, while TX interrupts mostly stay unused. Some comments below.quoted
+/* Maximum number of individual RX/TX queues supported */ +#define VIRTIO_MAX_TXQS 16 +This also does not seem to belong in the header.
Both virtio-net and vhost need some check to make sure very high values are not passed by userspace. Is this not required?
quoted
+#define VIRTIO_NET_F_NUMTXQS 21 /* Device supports multiple
TX queue */
VIRTIO_NET_F_MULTIQUEUE ?
Yes, that's a better name.
quoted
@@ -34,6 +38,8 @@ struct virtio_net_config { __u8 mac[6]; /* See VIRTIO_NET_F_STATUS and VIRTIO_NET_S_* above */ __u16 status; + /* number of RX/TX queues */ + __u16 numtxqs;The interface here is a bit ugly: - this is really both # of tx and rx queues but called numtxqs - there's a hardcoded max value - 0 is assumed to be same as 1 - assumptions above are undocumented. One way to address this could be num_queue_pairs, and something like /* The actual number of TX and RX queues is num_queue_pairs +
1 each. */
__u16 num_queue_pairs; (and tweak code to match). Alternatively, have separate registers for the number of tx and rx
queues. OK, so virtio_net_config has num_queue_pairs, and this gets converted to numtxqs in virtnet_info?
quoted
+struct virtnet_info { + struct send_queue **sq; + struct receive_queue **rq; + + /* read-mostly variables */ + int numtxqs ____cacheline_aligned_in_smp;Why do you think this alignment is a win?
Actually this code was from the earlier patchset (MQ TX only) where
the layout was different. Now rq and sq are allocated as follows:
vi->sq = kzalloc(numtxqs * sizeof(*vi->sq), GFP_KERNEL);
for (i = 0; i < numtxqs; i++) {
vi->sq[i] = kzalloc(sizeof(*vi->sq[i]), GFP_KERNEL);
Since the two pointers becomes read-only during use, there is no cache
line dirty'ing. I will remove this directive.
quoted
+/* + * Note for 'qnum' below: + * first 'numtxqs' vqs are RX, next 'numtxqs' vqs are TX. + */Another option to consider is to have them RX,TX,RX,TX: this way vq->queue_index / 2 gives you the queue pair number, no need to read numtxqs. On the other hand, it makes
the
#RX==#TX assumption even more entrenched.
OK. I was following how many drivers were allocating RX and TX's together - eg ixgbe_adapter has tx_ring and rx_ring arrays; bnx2 has rx_buf_ring and tx_buf_ring arrays, etc. Also, vhost has some code that processes tx first before rx (e.g. vhost_net_stop/flush), so this approach seemed helpful. I am OK either way, what do you suggest?
quoted
+ err = vi->vdev->config->find_vqs(vi->vdev, totalvqs, vqs,
callbacks,
quoted
+ (const char
**)names);
quoted
+ if (err) + goto free_params; +This would use up quite a lot of vectors. However, tx interrupt is, in fact, slow path. So, assuming we don't have enough vectors to use per vq, I think it's a good idea to support reducing MSI vector usage by mapping all TX VQs to the same
vector
and separate vectors for RX. The hypervisor actually allows this, but we don't have an API at the
virtio
level to pass that info to virtio pci ATM. Any idea what a good API to use would be?
Yes, it is a waste to have these vectors for tx ints. I initially thought of adding a flag to virtio_device to pass to vp_find_vqs, but it won't work, so a new API is needed. I can work with you on this in the background if you like.
quoted
+ for (i = 0; i < numtxqs; i++) { + vi->rq[i]->rvq = vqs[i]; + vi->sq[i]->svq = vqs[i + numtxqs];This logic is spread all over. We need some kind of macro to get queue number of vq number and back.
Will add this.
quoted
+ if (virtio_has_feature(vi->vdev, VIRTIO_NET_F_CTRL_VQ)) { + vi->cvq = vqs[i + numtxqs]; + + if (virtio_has_feature(vi->vdev,
VIRTIO_NET_F_CTRL_VLAN))
quoted
+ vi->dev->features |=
NETIF_F_HW_VLAN_FILTER;
This bit does not seem to belong in initialize_vqs.
I will move it back to probe.
quoted
+ err = virtio_config_val(vdev, VIRTIO_NET_F_NUMTXQS, + offsetof(struct
virtio_net_config, numtxqs),
quoted
+ &numtxqs); + + /* We need atleast one txq */ + if (err || !numtxqs) + numtxqs = 1;err is okay, but should we just fail on illegal values? Or change the semantics: n = 0; err = virtio_config_val(vdev, VIRTIO_NET_F_NUMTXQS, offsetof(struct virtio_net_config, numtxqs), &n); numtxq = n + 1;
Will this be better: int num_queue_pairs = 2; int numtxqs; err = virtio_config_val(vdev, VIRTIO_NET_F_MULTIQUEUE, offsetof(struct virtio_net_config, num_queue_pairs), &num_queue_pairs); <ignore error, if any> numtxqs = num_queue_pairs / 2;
quoted
+ if (numtxqs > VIRTIO_MAX_TXQS) + return -EINVAL;Do we strictly need this? I think we should just use whatever hardware has, or alternatively somehow ignore the unused queues (easy for tx, not sure about rx).
vq's are matched between qemu, virtio-net and vhost. Isn't some check required that userspace has not passed a bad value?
quoted
+ if (vi->rq[i]->num == 0) { + err = -ENOMEM; + goto free_recv_bufs; + } }If this fails for vq > 0, you have to detach bufs.
Right, will fix this.
quoted
free_vqs: + for (i = 0; i < numtxqs; i++) + cancel_delayed_work_sync(&vi->rq[i]->refill); vdev->config->del_vqs(vdev); -free: + free_rq_sq(vi);If we have a wrapper to init all vqs, pls add a wrapper to clean up all vqs as well.
Will add that.
quoted
+ for (i = 0; i < vi->numtxqs; i++) { + struct virtqueue *rvq = vi->rq[i]->rvq; + + while (1) { + buf = virtqueue_detach_unused_buf
(rvq);
quoted
+ if (!buf) + break; + if (vi->mergeable_rx_bufs || vi->
big_packets)
quoted
+ give_pages(vi->rq[i],
buf);
quoted
+ else + dev_kfree_skb(buf); + --vi->rq[i]->num; + } + BUG_ON(vi->rq[i]->num != 0); } - BUG_ON(vi->num != 0); + + free_rq_sq(vi);This looks wrong here. This function should detach and free all bufs, not internal malloc stuff.
That is being done by free_receive_buf after free_unused_bufs() returns. I hope this addresses your point.
I think we should have free_unused_bufs that handles a single queue, and call it in a loop.
OK, so define free_unused_bufs() as:
static void free_unused_bufs(struct virtnet_info *vi, struct virtqueue
*svq,
struct virtqueue *rvq)
{
/* Use svq and rvq with the remaining code unchanged */
}
Thanks,
- KK