Thread (73 messages) 73 messages, 13 authors, 2021-12-01

Re: [PATCH v2 net-next 21/26] ice: add XDP and XSK generic per-channel statistics

From: Lorenz Bauer <hidden>
Date: 2021-11-24 16:34:24
Also in: bpf, linux-doc, lkml, netdev

Daniel asked me to share my opinion, as Cloudflare has an XDP load
balancer as well.

On Wed, 24 Nov 2021 at 00:53, Daniel Borkmann [off-list ref] wrote:
I'm just taking our XDP L4LB in Cilium as an example: there we already count errors and
export them via per-cpu map that eventually lead to XDP_DROP cases including the /reason/
which caused the XDP_DROP (e.g. Prometheus can then scrape these insights from all the
nodes in the cluster). Given the different action codes are very often application specific,
there's not much debugging that you can do when /only/ looking at `ip link xdpstats` to
gather insight on *why* some of these actions were triggered (e.g. fib lookup failure, etc).
Agreed. For our purpose we often want to know whether a specific
program has been invoked. Per-channel or per device stats don't help
us much since we have a chain of programs (not using libxdp though).
My colleague Arthur has written xdpcap [1], which gives per-action,
per-program counters. This way we can correlate an action with a
packet and a program.
If really of interest, then maybe libxdp could have such per-action counters as opt-in in
its call chain..
We could also make it part of BPF_ENABLE_STATS, it's kind of coarse
grained though.
In the case of ice_run_xdp() today, we already bump total_rx_bytes/total_rx_pkts under
XDP and update ice_update_rx_ring_stats(). I do see the case for XDP_TX and XDP_REDIRECT
where we run into driver-specific errors that are /outside of the reach/ of the BPF prog.
For example, we've been running into errors from XDP_TX in ice_xmit_xdp_ring() in the
past during testing, and were able to pinpoint the location as xdp_ring->tx_stats.tx_busy
was increasing. These things are useful and would make sense to standardize for XDP context.
I'd like to see more tracepoints like trace_xdp_exception, personally.
We can use things like bpftrace for exploration and ebpf_exporter [2]
to generate alerts much more easily than something wired into
iproute2.

Best
Lorenz

1: https://github.com/cloudflare/xdpcap
2: https://github.com/cloudflare/ebpf_exporter

-- 
Lorenz Bauer  |  Systems Engineer
6th Floor, County Hall/The Riverside Building, SE1 7PB, UK

www.cloudflare.com
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help