Re: [RFC] Socket termination for policy enforcement and load-balancing
From: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Date: 2022-09-04 21:25:22
Also in:
bpf
On Sun, 4 Sept 2022 at 20:55, Aditi Ghag [off-list ref] wrote:
On Wed, Aug 31, 2022 at 4:02 PM Martin KaFai Lau [off-list ref] wrote:quoted
On Wed, Aug 31, 2022 at 09:37:41AM -0700, Aditi Ghag wrote:quoted
- Use BPF (sockets) iterator to identify sockets connected to a deleted backend. The BPF (sockets) iterator is network namespace aware so we'll either need to enter every possible container network namespace to identify the affected connections, or adapt the iterator to be without netns checks [3]. This was discussed with my colleague Daniel Borkmann based on the feedback he shared from the LSFMMBPF conference discussions.Being able to iterate all sockets across different netns will be useful. It should be doable to ignore the netns check. For udp, a quick thought is to have another iter target. eg. "udp_all_netns". From the sk, the bpf prog should be able to learn the netns and the bpf prog can filter the netns by itself. The TCP side is going to have an 'optional' per netns ehash table [0] soon, not lhash2 (listening hash) though. Ideally, the same bpf all-netns iter interface should work similarly for both udp and tcp case. Thus, both should be considered and work at the same time. For udp, something more useful than plain udp_abort() could potentially be done. eg. directly connect to another backend (by bpf kfunc?). There may be some details in socket locking...etc but should be doable and the bpf-iter program could be sleepable also.This won't be effective for connected udp though, will it? Interesting thought around using bpf kfunc!quoted
fwiw, we are iterating the tcp socket to retire some older bpf-tcp-cc (congestion control) on the long-lived connections by bpf_setsockopt(TCP_CONGESTION). Also, potentially, instead of iterating all, a more selective case can be done by bpf_prog_test_run()+bpf_sk_lookup_*()+udp_abort().Can you elaborate more on the more selective iterator approach? On a similar note, are there better ways as alternatives to the sockets iterator approach. Since we have BPF programs executed on cgroup BPF hooks (e.g., connect), we already know what client sockets are connected to a backend. Can we somehow store these socket pointers in a regular BPF map, and when a backend is deleted, use a regular map iterator to invoke sock_destroy() for these sockets? Does anyone have experience using the "typed pointer support in BPF maps" APIs [0]?
I am not very familiar with how socket lifetime is managed, it may not be possible in case lifetime is managed by RCU only, or due to other limitations. Martin will probably be able to comment more on that. Apart from that, from the BPF side, it referenced kptr won't work out of the box, you will need to add support for each type you want to support. But the way you're describing should work well. Ideally you would inc a ref and move it into map from the hook program, and just xchg out the sk to destroy from map value during iteration and then pass it to sock_destroy helper to release it (instead of sk_release). First task for this will be teaching kptr_xchg to work with non-PTR_TO_BTF_ID arguments. You can use the same process as how translation is done to in-kernel PTR_TO_BTF_ID by reg2btf_ids in kernel/bpf/btf.c for socket types for kfuncs. Usually socket types will be PTR_TO_SOCKET or PTR_TO_TCP_SOCK etc, they can be mapped using that table to the btf_id of in-kernel type they shadow. From there, it will be about writing the right dtor for the socket type which can work in all contexts the dtor for the socket is called from map implementations, and registering it, and probably also restricting the kptr_xchg for socket to certain known contexts to make life easier.
[0] https://lwn.net/ml/bpf/20220424214901.2743946-1-memxor@gmail.com/