Re: [PATCH v2 net-next 4/5] xps_flows: XPS for packets that don't have a socket
From: Tom Herbert <hidden>
Date: 2016-09-29 14:08:16
On Thu, Sep 29, 2016 at 9:18 AM, Eric Dumazet [off-list ref] wrote:
On Thu, 2016-09-29 at 08:53 -0400, Tom Herbert wrote:quoted
On Thu, Sep 29, 2016 at 12:54 AM, Eric Dumazet [off-list ref] wrote:quoted
On Wed, 2016-09-28 at 20:54 -0700, Tom Herbert wrote:quoted
xps_flows maintains a per device flow table that is indexed by the skbuff hash. The table is only consulted when there is no queue saved in a transmit socket for an skbuff. Each entry in the flow table contains a queue index and a queue pointer. The queue pointer is set when a queue is chosen using a flow table entry. This pointer is set to the head pointer in the transmit queue (which is maintained by BQL). The new function get_xps_flows_index that looks up flows in the xps_flows table. The entry returned gives the last queue a matching flow used. The returned queue is compared against the normal XPS queue. If they are different, then we only switch if the tail pointer in the TX queue has advanced past the pointer saved in the entry. In this way OOO should be avoided when XPS wants to use a different queue.There is something I do not understand with this. If this OOO avoidance is tied to BQL, it means that packets sitting in a qdisc wont be part of the detection. So packets of flow X could possibly be queued on multiple qdiscs.Hi Eric, I'm not sure I understand your concern. If packets for flow X can be queued on multiple qdiscs wouldn't that mean that flow will be subject to ooo transmission regardless of this patch? In other words here we're trying to keep packets for the flow in order as they are emitted from the qdiscs which we assume maintains ordering of packets in a flow.Well, then what this patch series is solving ?
It addresses the issue that Rick Jones pointed out was happening with XPS. When packets are sent for a flow that has no socket and XPS is enabled then each packet uses the XPS queue based on the running CPU. Since the thread sending on a flow can be rescheduled on different CPUs this is creating ooo packets. In this case the ooo is being caused by interaction with XPS.
You have a producer of packets running on 8 vcpus in a VM. Packets are exiting the VM and need to be queued on a mq NIC in the hypervisor. Flow X can be scheduled on any of these 8 vcpus, so XPS is currently selecting different TXQ. Your patch aims to detect this and steer packets to one TXQ, but not using a static hash() % num_real_queues (as RSS would have done on a NIC at RX), but trying to please XPS (as : selecting a queue close to the CPU producing the packet. VM with one vcpu, and cpu bounded, would really be happy to not spread packets all over the queues)
You're not describing this patch, you're describing how XPS works in the first place. This patch is done to make socketless packets work well with XPS. If all you want is a static hash() % num_real_queues then just disable XPS to get that.
Your mechanism relies on counters updated at the time packets are given to the NIC, not at the time packets are given to the txq (and eventually sit for a while in the qdisc right before BQL layer) dev_queue_xmit() is not talking to the device directly... All I am saying is that the producer/consumer counters you added are not at the right place. You tried hard to not change the drivers, by adding something to existing BQL. But packets can sit in a qdisc for a while...
But again, this patch is to prevent ooo being caused by an interaction with XPS. It does not address ooo being caused by qdiscs or VMs, but then I don't see that as being the issue reported by Rick.
So you added 2 potential cache lines misses per outgoing packet, but with no guarantee that OOO are really avoided.