Thread (14 messages) 14 messages, 3 authors, 2016-09-29

Re: [PATCH v2 net-next 4/5] xps_flows: XPS for packets that don't have a socket

From: Tom Herbert <hidden>
Date: 2016-09-29 14:08:16

On Thu, Sep 29, 2016 at 9:18 AM, Eric Dumazet [off-list ref] wrote:
On Thu, 2016-09-29 at 08:53 -0400, Tom Herbert wrote:
quoted
On Thu, Sep 29, 2016 at 12:54 AM, Eric Dumazet [off-list ref] wrote:
quoted
On Wed, 2016-09-28 at 20:54 -0700, Tom Herbert wrote:
quoted
xps_flows maintains a per device flow table that is indexed by the
skbuff hash. The table is only consulted when there is no queue saved in
a transmit socket for an skbuff.

Each entry in the flow table contains a queue index and a queue
pointer. The queue pointer is set when a queue is chosen using a
flow table entry. This pointer is set to the head pointer in the
transmit queue (which is maintained by BQL).

The new function get_xps_flows_index that looks up flows in the
xps_flows table. The entry returned gives the last queue a matching flow
used. The returned queue is compared against the normal XPS queue. If
they are different, then we only switch if the tail pointer in the TX
queue has advanced past the pointer saved in the entry. In this
way OOO should be avoided when XPS wants to use a different queue.
There is something I do not understand with this.

If this OOO avoidance is tied to BQL, it means that packets sitting in a
qdisc wont be part of the detection.

So packets of flow X could possibly be queued on multiple qdiscs.
Hi Eric,

I'm not sure I understand your concern. If packets for flow X can be
queued on multiple qdiscs wouldn't that mean that flow will be subject
to ooo transmission regardless of this patch? In other words here
we're trying to keep packets for the flow in order as they are emitted
from the qdiscs which we assume maintains ordering of packets in a
flow.
Well, then what this patch series is solving ?
It addresses  the issue that Rick Jones pointed out was happening with
XPS. When packets are sent for a flow that has no socket and XPS is
enabled then each packet uses the XPS queue based on the running CPU.
Since the thread sending on a flow can be rescheduled on different
CPUs this is creating ooo packets. In this case the ooo is being
caused by interaction with XPS.
You have a producer of packets running on 8 vcpus in a VM.

Packets are exiting the VM and need to be queued on a mq NIC in the
hypervisor.

Flow X can be scheduled on any of these 8 vcpus, so XPS is currently
selecting different TXQ.

Your patch aims to detect this and steer packets to one TXQ, but not
using a static hash() % num_real_queues (as RSS would have done on a NIC
at RX), but trying to please XPS (as : selecting a queue close to the
CPU producing the packet. VM with one vcpu, and cpu bounded, would
really be happy to not spread packets all over the queues)
You're not describing this patch, you're describing how XPS works in
the first place. This patch is done to make socketless packets work
well with XPS. If all you want is a static hash() % num_real_queues
then just disable XPS to get that.
Your mechanism relies on counters updated at the time packets are given
to the NIC, not at the time packets are given to the txq (and eventually
sit for a while in the qdisc right before BQL layer)

dev_queue_xmit() is not talking to the device directly...


All I am saying is that the producer/consumer counters you added are not
at the right place.

You tried hard to not change the drivers, by adding something to
existing BQL. But packets can sit in a qdisc for a while...
But again, this patch is to prevent ooo being caused by an interaction
with XPS. It does not address ooo being caused by qdiscs or VMs, but
then I don't see that as being the issue reported by Rick.
So you added 2 potential cache lines misses per outgoing packet, but
with no guarantee that OOO are really avoided.

Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help