Re: [RFC PATCH 00/14] Introducing AF_PACKET V4 support
From: Björn Töpel <hidden>
Date: 2017-11-14 05:34:00
2017-11-14 0:50 GMT+01:00 Alexei Starovoitov [off-list ref]:
On 11/13/17 9:07 PM, Björn Töpel wrote:quoted
2017-10-31 13:41 GMT+01:00 Björn Töpel [off-list ref]:quoted
From: Björn Töpel <redacted>[...]quoted
We'll do a presentation on AF_PACKET V4 in NetDev 2.2 [1] Seoul, Korea, and our paper with complete benchmarks will be released shortly on the NetDev 2.2 site.We're back in the saddle after an excellent netdevconf week. Kudos to the organizers; We had a blast! Thanks for all the constructive feedback. I'll summarize the major points, that we'll address in the next RFC below. * Instead of extending AF_PACKET with yet another version, introduce a new address/packet family. As for naming had some name suggestions: AF_CAPTURE, AF_CHANNEL, AF_XDP and AF_ZEROCOPY. We'll go for AF_ZEROCOPY, unless there're no strong opinions against it. * No explicit zerocopy enablement. Use the zeropcopy path if supported, if not -- fallback to the skb path, for netdevs that don't support the required ndos. Further, we'll have the zerocopy behavior for the skb path as well, meaning that an AF_ZEROCOPY socket will consume the skb and we'll honor skb->queue_mapping, meaning that we only consume the packets for the enabled queue. * Limit the scope of the first patchset to Rx only, and introduce Tx in a separate patchset.all sounds good to me except above bit. I don't remember people suggesting to split it this way. What's the value of it without tx?
We definitely need Tx for our use-cases! I'll rephrase, so the idea was making the initial patch set without Tx *driver* specific code, e.g. use ndo_xdp_xmit/flush at a later point. So AF_ZEROCOPY, the socket parts, would have Tx support. @John Did I recall that correctly?
quoted
* Minimize the size of the i40e zerocopy patches, by moving the driver specific code to separate patches. * Do not introduce a new XDP action XDP_PASS_TO_KERNEL, instead use XDP redirect map call with ingress flag. * Extend the XDP redirect to support explicit allocator/destructor functions. Right now, XDP redirect assumes that the page allocator was used, and the XDP redirect cleanup path is decreasing the page count of the XDP buffer. This assumption breaks for the zerocopy case. Björnquoted
We based this patch set on net-next commit e1ea2f9856b7 ("Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net"). Please focus your review on: * The V4 user space interface * PACKET_ZEROCOPY and its semantics * Packet array interface * XDP semantics when excuting in zero-copy mode (user space passed buffers) * XDP_PASS_TO_KERNEL semantics To do: * Investigate the user-space ring structure’s performance problems * Continue the XDP integration into packet arrays * Optimize performance * SKB <-> V4 conversions in tp4a_populate & tp4a_flush * Packet buffer is unnecessarily pinned for virtual devices * Support shared packet buffers * Unify V4 and SKB receive path in I40E driver * Support for packets spanning multiple frames * Disassociate the packet array implementation from the V4 queue structure We would really like to thank the reviewers of the limited distribution RFC for all their comments that have helped improve the interfaces and the code significantly: Alexei Starovoitov, Alexander Duyck, Jesper Dangaard Brouer, and John Fastabend. The internal team at Intel that has been helping out reviewing code, writing tests, and sanity checking our ideas: Rami Rosen, Jeff Shaw, Ferruh Yigit, and Qi Zhang, your participation has really helped. Thanks: Björn and Magnus [1] https://urldefense.proofpoint.com/v2/url?u=https-3A__www.netdevconf.org_2.2_&d=DwIFaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=qR6oNZj1CqLATni4ibTgAQ&m=lKyFxON3kKygiOgECLBfmqRwM7ZyXFSUvLED1vP-gos&s=44jzm1W8xkGyZSZVANRygzHz6y4XHbYrYBRM-K5RhTc&e= Björn Töpel (7): packet: introduce AF_PACKET V4 userspace API packet: implement PACKET_MEMREG setsockopt packet: enable AF_PACKET V4 rings packet: wire up zerocopy for AF_PACKET V4 i40e: AF_PACKET V4 ndo_tp4_zerocopy Rx support i40e: AF_PACKET V4 ndo_tp4_zerocopy Tx support samples/tpacket4: added tpbench Magnus Karlsson (7): packet: enable Rx for AF_PACKET V4 packet: enable Tx support for AF_PACKET V4 netdevice: add AF_PACKET V4 zerocopy ops veth: added support for PACKET_ZEROCOPY samples/tpacket4: added veth support i40e: added XDP support for TP4 enabled queue pairs xdp: introducing XDP_PASS_TO_KERNEL for PACKET_ZEROCOPY use drivers/net/ethernet/intel/i40e/i40e.h | 3 + drivers/net/ethernet/intel/i40e/i40e_ethtool.c | 9 + drivers/net/ethernet/intel/i40e/i40e_main.c | 837 ++++++++++++- drivers/net/ethernet/intel/i40e/i40e_txrx.c | 582 ++++++++- drivers/net/ethernet/intel/i40e/i40e_txrx.h | 38 + drivers/net/veth.c | 174 +++ include/linux/netdevice.h | 16 + include/linux/tpacket4.h | 1502 ++++++++++++++++++++++++ include/uapi/linux/bpf.h | 1 + include/uapi/linux/if_packet.h | 65 +- net/packet/af_packet.c | 1252 +++++++++++++++++--- net/packet/internal.h | 9 + samples/tpacket4/Makefile | 12 + samples/tpacket4/bench_all.sh | 28 + samples/tpacket4/tpbench.c | 1390 ++++++++++++++++++++++ 15 files changed, 5674 insertions(+), 244 deletions(-) create mode 100644 include/linux/tpacket4.h create mode 100644 samples/tpacket4/Makefile create mode 100755 samples/tpacket4/bench_all.sh create mode 100644 samples/tpacket4/tpbench.c -- 2.11.0