Thread (35 messages) 35 messages, 4 authors, 2019-12-12

Re: [PATCH bpf-next 0/8] Extend SOCKMAP to store listening sockets

From: Jakub Sitnicki <jakub@cloudflare.com>
Date: 2019-11-25 09:23:01
Also in: bpf

On Sun, Nov 24, 2019 at 07:10 AM CET, John Fastabend wrote:
Jakub Sitnicki wrote:
quoted
This patch set makes SOCKMAP more flexible by allowing it to hold TCP
sockets that are either in established or listening state. With it SOCKMAP
can act as a drop-in replacement for REUSEPORT_SOCKARRAY which reuseport
BPF programs use. Granted, it is limited to only TCP sockets.

The idea started out at LPC '19 as feedback from John Fastabend to our
troubles with repurposing REUSEPORT_SOCKARRAY as a collection of listening
sockets accessed by a BPF program ran on socket lookup [1]. Without going
into details, REUSEPORT_SOCKARRAY proved to be tightly coupled with
reuseport logic. Talk from LPC (see slides [2] or video [3]) highlights
what problems we ran into when trying to make REUSEPORT_SOCKARRAY work for
our use-case.

Patches have evolved quite a bit since the RFC series from a month ago
[4]. To recap the RFC feedback, John pointed out that BPF redirect helpers
for SOCKMAP need sane semantics when used with listening sockets [5], and
that SOCKMAP lookup from BPF would be useful [6]. While Martin asked for
UDP support [7].
Curious if you've started looking into UDP support. I had hoped to do
it but haven't got there yet.
No, not yet. I only made sure the newly added tests were easy to modify
to cover UDP by not hard-coding the socket type.

I expect to break ground with UDP work soon, though. Right after I push
out another iteration of programmable socket lookup [1] patches adapted for
SOCKMAP, which we've been testing internally.
quoted
As it happens, patches needed more work to get SOCKMAP to actually behave
correctly with listening sockets. It turns out flexibility has its
price. Change log below outlines them all.
But looks pretty clean to me, only major change here is to add an extra
hook to remove psock from the child socket. And that looks fine to me and
cleaner than any other solution I had in mind.

Changes +/- looks good as well most the updates are in selftests to update
tests and add some new ones. +1
Thanks for taking a look at the patches so quickly. I appreciate it.

-Jakub

[1] https://lore.kernel.org/bpf/20190828072250.29828-1-jakub@cloudflare.com/ (local)
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help