Re: [PATCH bpf-next 0/7] bpf: Propagate cn to TCP

[PATCH bpf-next 0/7] bpf: Propagate cn to TCP · brakmo <hidden> · 2019-03-23
[PATCH bpf-next 1/7] bpf: Create BPF_PROG_CGROUP_INET_EGRESS_RUN_ARRAY · brakmo <hidden> · 2019-03-23
[PATCH bpf-next 2/7] bpf: cgroup inet skb programs can return 0 to 3 · brakmo <hidden> · 2019-03-23
[PATCH bpf-next 4/7] bpf: Update BPF_CGROUP_RUN_PROG_INET_EGRESS calls · brakmo <hidden> · 2019-03-23
[PATCH bpf-next 5/7] bpf: sysctl for probe_on_drop · brakmo <hidden> · 2019-03-23
[PATCH bpf-next 3/7] bpf: Update __cgroup_bpf_run_filter_skb with cn · brakmo <hidden> · 2019-03-23
[PATCH bpf-next 7/7] bpf: Add more stats to HBM · brakmo <hidden> · 2019-03-23
[PATCH bpf-next 6/7] bpf: Add cn support to hbm_out_kern.c · brakmo <hidden> · 2019-03-23
Re: [PATCH bpf-next 0/7] bpf: Propagate cn to TCP · Eric Dumazet <hidden> · 2019-03-23
Re: [PATCH bpf-next 0/7] bpf: Propagate cn to TCP · Alexei Starovoitov <hidden> · 2019-03-23
Re: [PATCH bpf-next 0/7] bpf: Propagate cn to TCP · Eric Dumazet <hidden> · 2019-03-24
Re: [PATCH bpf-next 0/7] bpf: Propagate cn to TCP · Alexei Starovoitov <hidden> · 2019-03-24
Re: [PATCH bpf-next 0/7] bpf: Propagate cn to TCP · Eric Dumazet <hidden> · 2019-03-25
Re: [PATCH bpf-next 0/7] bpf: Propagate cn to TCP · Eric Dumazet <hidden> · 2019-03-25
Re: [PATCH bpf-next 0/7] bpf: Propagate cn to TCP · Alexei Starovoitov <hidden> · 2019-03-26
Re: [PATCH bpf-next 0/7] bpf: Propagate cn to TCP · Eric Dumazet <hidden> · 2019-03-26
Re: [PATCH bpf-next 0/7] bpf: Propagate cn to TCP · Alexei Starovoitov <hidden> · 2019-03-26
Re: [PATCH bpf-next 0/7] bpf: Propagate cn to TCP · Eric Dumazet <hidden> · 2019-03-26
Re: [PATCH bpf-next 0/7] bpf: Propagate cn to TCP · Alexei Starovoitov <hidden> · 2019-03-26
Re: [PATCH bpf-next 0/7] bpf: Propagate cn to TCP · Eric Dumazet <hidden> · 2019-03-26
Re: [PATCH bpf-next 0/7] bpf: Propagate cn to TCP · Eric Dumazet <hidden> · 2019-03-26
Re: [PATCH bpf-next 0/7] bpf: Propagate cn to TCP · Eric Dumazet <hidden> · 2019-03-24
Re: [PATCH bpf-next 0/7] bpf: Propagate cn to TCP · Lawrence Brakmo <hidden> · 2019-03-24
Re: [PATCH bpf-next 0/7] bpf: Propagate cn to TCP · Eric Dumazet <hidden> · 2019-03-24

From: Alexei Starovoitov <hidden>
Date: 2019-03-23 15:41:32

On Sat, Mar 23, 2019 at 02:12:39AM -0700, Eric Dumazet wrote:


On 03/23/2019 01:05 AM, brakmo wrote:

quoted

This patchset adds support for propagating congestion notifications (cn)
to TCP from cgroup inet skb egress BPF programs.

Current cgroup skb BPF programs cannot trigger TCP congestion window
reductions, even when they drop a packet. This patch-set adds support
for cgroup skb BPF programs to send congestion notifications in the
return value when the packets are TCP packets. Rather than the
current 1 for keeping the packet and 0 for dropping it, they can
now return:
    NET_XMIT_SUCCESS    (0)    - continue with packet output
    NET_XMIT_DROP       (1)    - drop packet and do cn
    NET_XMIT_CN         (2)    - continue with packet output and do cn
    -EPERM                     - drop packet

I believe I already mentioned this model is broken, if you have any virtual
device before the cgroup BPF program.

Please think about offloading the pacing/throttling in the NIC,
there is no way we will report back to tcp stack instant notifications.

I don't think 'offload to google proprietary nic' is a suggestion
that folks can practically follow.
Very few NICs can offload pacing to hw and there are plenty of limitations.
This patch set represents a pure sw solution that works and scales to millions of flows.

This patch series is going way too far for my taste.

I would really appreciate if you can do a technical review of the patches.
Our previous approach didn't quite work due to complexity around locked/non-locked socket.
This is a cleaner approach.
Either we go with this one or will add a bpf hook into __tcp_transmit_skb.
This approach is better since it works for other protocols and can be
used by qdiscs w/o any bpf.

This idea is not new, you were at Google when it was experimented by Nandita and
others, and we know it is not worth the pain.

google networking needs are different from the rest of the world.

Thank you.

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help