Re: [net-next RFC PATCH 5/5] virtio-net: flow director support

[net-next RFC PATCH 0/5] Series short description · Jason Wang <jasowang@redhat.com> · 2011-12-05
[net-next RFC PATCH 1/5] virtio_net: passing rxhash through vnet_hdr · Jason Wang <jasowang@redhat.com> · 2011-12-05
[net-next RFC PATCH 2/5] tuntap: simple flow director support · Jason Wang <jasowang@redhat.com> · 2011-12-05
Re: [net-next RFC PATCH 2/5] tuntap: simple flow director support · Stefan Hajnoczi <hidden> · 2011-12-05
Re: [net-next RFC PATCH 2/5] tuntap: simple flow director support · Ben Hutchings <hidden> · 2011-12-05
Re: [net-next RFC PATCH 2/5] tuntap: simple flow director support · Jason Wang <jasowang@redhat.com> · 2011-12-06
Re: [net-next RFC PATCH 2/5] tuntap: simple flow director support · Ben Hutchings <hidden> · 2011-12-06
[net-next RFC PATCH 3/5] macvtap: flow director support · Jason Wang <jasowang@redhat.com> · 2011-12-05
Re: [net-next RFC PATCH 3/5] macvtap: flow director support · Ben Hutchings <hidden> · 2011-12-05
[net-next RFC PATCH 4/5] virtio: introduce a method to get the irq of a specific virtqueue · Jason Wang <jasowang@redhat.com> · 2011-12-05
[net-next RFC PATCH 5/5] virtio-net: flow director support · Jason Wang <jasowang@redhat.com> · 2011-12-05
Re: [net-next RFC PATCH 5/5] virtio-net: flow director support · Stefan Hajnoczi <hidden> · 2011-12-05
Re: [net-next RFC PATCH 5/5] virtio-net: flow director support · Jason Wang <jasowang@redhat.com> · 2011-12-06
Re: [net-next RFC PATCH 5/5] virtio-net: flow director support · Stefan Hajnoczi <hidden> · 2011-12-06
Re: [net-next RFC PATCH 5/5] virtio-net: flow director support · Jason Wang <jasowang@redhat.com> · 2011-12-06
Re: [net-next RFC PATCH 5/5] virtio-net: flow director support · Stefan Hajnoczi <hidden> · 2011-12-06
Re: [net-next RFC PATCH 5/5] virtio-net: flow director support · Sridhar Samudrala <hidden> · 2011-12-06
Re: [net-next RFC PATCH 5/5] virtio-net: flow director support · "Michael S. Tsirkin" <mst@redhat.com> · 2011-12-06
Re: [net-next RFC PATCH 5/5] virtio-net: flow director support · Sridhar Samudrala <hidden> · 2011-12-06
Re: [net-next RFC PATCH 5/5] virtio-net: flow director support · Jason Wang <jasowang@redhat.com> · 2011-12-07
Re: [net-next RFC PATCH 5/5] virtio-net: flow director support · Jason Wang <jasowang@redhat.com> · 2011-12-07
Re: [net-next RFC PATCH 5/5] virtio-net: flow director support · Sridhar Samudrala <hidden> · 2011-12-09
Re: [net-next RFC PATCH 5/5] virtio-net: flow director support · Jason Wang <jasowang@redhat.com> · 2011-12-07
Re: [net-next RFC PATCH 5/5] virtio-net: flow director support · Stefan Hajnoczi <hidden> · 2011-12-07
Re: [net-next RFC PATCH 5/5] virtio-net: flow director support · Jason Wang <jasowang@redhat.com> · 2011-12-07
Re: [net-next RFC PATCH 5/5] virtio-net: flow director support · Stefan Hajnoczi <hidden> · 2011-12-07
Re: [net-next RFC PATCH 5/5] virtio-net: flow director support · Ben Hutchings <hidden> · 2011-12-05
Re: [net-next RFC PATCH 5/5] virtio-net: flow director support · Jason Wang <jasowang@redhat.com> · 2011-12-06
Re: [net-next RFC PATCH 5/5] virtio-net: flow director support · Ben Hutchings <hidden> · 2011-12-06
Re: [net-next RFC PATCH 0/5] Series short description · Rusty Russell <hidden> · 2011-12-07
Re: [net-next RFC PATCH 0/5] Series short description · Jason Wang <jasowang@redhat.com> · 2011-12-07
Re: [net-next RFC PATCH 0/5] Series short description · Ben Hutchings <hidden> · 2011-12-07
Re: [net-next RFC PATCH 0/5] Series short description · Jason Wang <jasowang@redhat.com> · 2011-12-08
Re: [net-next RFC PATCH 0/5] Series short description · Rusty Russell <hidden> · 2011-12-09
Re: [net-next RFC PATCH 0/5] Series short description · Ben Hutchings <hidden> · 2011-12-15
Re: [net-next RFC PATCH 0/5] Series short description · Rusty Russell <hidden> · 2011-12-15

From: Jason Wang <jasowang@redhat.com>
Date: 2011-12-06 07:25:16
Also in: kvm, virtualization

On 12/06/2011 04:42 AM, Ben Hutchings wrote:

On Mon, 2011-12-05 at 16:59 +0800, Jason Wang wrote:

quoted

In order to let the packets of a flow to be passed to the desired
guest cpu, we can co-operate with devices through programming the flow
director which was just a hash to queue table.

This kinds of co-operation is done through the accelerate RFS support,
a device specific flow sterring method virtnet_fd() is used to modify
the flow director based on rfs mapping. The desired queue were
calculated through reverse mapping of the irq affinity table. In order
to parallelize the ingress path, irq affinity of rx queue were also
provides by the driver.

In addition to accelerate RFS, we can also use the guest scheduler to
balance the load of TX and reduce the lock contention on egress path,
so the processor_id() were used to tx queue selection.

[...]

quoted

+#ifdef CONFIG_RFS_ACCEL
+
+int virtnet_fd(struct net_device *net_dev, const struct sk_buff *skb,
+	       u16 rxq_index, u32 flow_id)
+{
+	struct virtnet_info *vi = netdev_priv(net_dev);
+	u16 *table = NULL;
+
+	if (skb->protocol != htons(ETH_P_IP) || !skb->rxhash)
+		return -EPROTONOSUPPORT;

Why only IPv4?

Oops, IPv6 should work also.

quoted

+	table = kmap_atomic(vi->fd_page);
+	table[skb->rxhash&  TAP_HASH_MASK] = rxq_index;
+	kunmap_atomic(table);
+
+	return 0;
+}
+#endif

This is not a proper implementation of ndo_rx_flow_steer.  If you steer
a flow by changing the RSS table this can easily cause packet reordering
in other flows.  The filtering should be more precise, ideally matching
exactly a single flow by e.g. VID and IP 5-tuple.

I think you need to add a second hash table which records exactly which
flow is supposed to be steered.  Also, you must call
rps_may_expire_flow() to check whether an entry in this table may be
replaced; otherwise you can cause packet reordering in the flow that was
previously being steered.

Finally, this function must return the table index it assigned, so that
rps_may_expire_flow() works.

Thanks for the explanation, how about document this briefly in scaling.txt?

quoted

+static u16 virtnet_select_queue(struct net_device *dev, struct sk_buff *skb)
+{
+	int txq = skb_rx_queue_recorded(skb) ? skb_get_rx_queue(skb) :
+					       smp_processor_id();
+
+	/* As we make use of the accelerate rfs which let the scheduler to
+	 * balance the load, it make sense to choose the tx queue also based on
+	 * theprocessor id?
+	 */
+	while (unlikely(txq>= dev->real_num_tx_queues))
+		txq -= dev->real_num_tx_queues;
+	return txq;
+}

[...]

Don't do this, let XPS handle it.

Ben.

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help