Re: epoll_wait() performance
From: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
Date: 2019-12-02 16:47:45
Also in:
lkml
On Mon, Dec 2, 2019 at 7:24 AM David Laight [off-list ref] wrote:
From: Jakub Sitnicki <jakub@cloudflare.com>quoted
Sent: 30 November 2019 13:30 On Sat, Nov 30, 2019 at 02:07 AM CET, Eric Dumazet wrote:quoted
On 11/28/19 2:17 AM, David Laight wrote:...quoted
quoted
quoted
How can you do that when all the UDP flows have different destination port numbers? These are message flows not idempotent requests. I don't really want to collect the packets before they've been processed by IP. I could write a driver that uses kernel udp sockets to generate a single message queue than can be efficiently processed from userspace - but it is a faff compiling it for the systems kernel version.Well if destinations ports are not under your control, you also could use AF_PACKET sockets, no need for 'UDP sockets' to receive UDP traffic, especially it the rate is small.Alternatively, you could steer UDP flows coming to a certain port range to one UDP socket using TPROXY [0, 1].I don't think that can work, we don't really know the list of valid UDP port numbers ahead of time.
How about -j REDIRECT. That does not require all ports to be known ahead of time.
quoted
TPROXY has the same downside as AF_PACKET, meaning that it requires at least CAP_NET_RAW to create/set up the socket.CAP_NET_RAW wouldn't be a problem - we already send from a 'raw' socket.
One other issue when comparing udp and packet sockets is ip defragmentation. That is critical code that is not at all trivial to duplicate in userspace. Even when choosing packet sockets, which normally would not defragment, there is a trick. A packet socket with fanout and flag PACKET_FANOUT_FLAG_DEFRAG will defragment before fanout.