Re: [PATCH v2 net-next 21/26] ice: add XDP and XSK generic per-channel statistics
From: Jesper Dangaard Brouer <hidden>
Date: 2021-11-29 14:02:03
Also in:
bpf, linux-doc, lkml, netdev
On 27/11/2021 00.01, Daniel Borkmann wrote:
On 11/26/21 11:27 PM, Daniel Borkmann wrote:quoted
On 11/26/21 7:06 PM, Jakub Kicinski wrote:[...]quoted
quoted
The information required by the admin is higher level. As you say the primary concern there is "how many packets did XDP eat".Agree. Above said, for XDP_DROP I would see one use case where you compare different drivers or bond vs no bond as we did in the past in [0] when testing against a packet generator (although I don't see bond driver covered in this series here yet where it aggregates the XDP stats from all bond slave devs). On a higher-level wrt "how many packets did XDP eat", it would make sense to have the stats for successful XDP_{TX,REDIRECT} given these are out of reach from a BPF prog PoV - we can only count there how many times we returned with XDP_TX but not whether the pkt /successfully made it/.
Exactly.
quoted
In terms of error cases, could we just standardize all drivers on the behavior of e.g. mlx5e_xdp_handle(), meaning, a failure from XDP_{TX,REDIRECT} will hit the trace_xdp_exception() and then fallthrough to bump a drop counter (same as we bump in XDP_DROP then). So the drop counter will account for program drops but also driver-related drops.
Hmm... I don't agree here. IMHO the BPF-program's *choice* to drop (via XDP_DROP) should NOT share the counter with the driver-related drops. The driver-related drops must be accounted separate. For the record, I think mlx5e_xdp_handle() does the wrong thing, of accounting everything as XDP_DROP in (rq->stats->xdp_drop++). Current mlx5 driver stats are highly problematic actually. Please don't model stats behavior after this driver. E.g. if BPF-prog takes the *choice* XDP_TX or XDP_REDIRECT or XDP_DROP, then the packet is invisible to "ifconfig" stats. It is like the driver never received these packets (which is wrong IMHO). (The stats are only avail via ethtool -S).
quoted
At some later point the trace_xdp_exception() could be extended with an error code that the driver would propagate (given some of them look quite similar across drivers, fwiw), and then whoever wants to do further processing with them can do so via bpftrace or other tooling.
I do like trace_xdp_exception() is invoked in mlx5e_xdp_handle(), but do notice that xdp_do_redirect() also have a tracepoint that can be used for troubleshooting. (I usually use xdp_monitor for troubleshooting which catch both). I like the stats XDP handling better in mvneta_run_xdp().
Just thinking out loud, one straight forward example we could start out with that is also related to Paolo's series [1] ... enum xdp_error { XDP_UNKNOWN, XDP_ACTION_INVALID, XDP_ACTION_UNSUPPORTED, }; ... and then bpf_warn_invalid_xdp_action() returns one of the latter two which we pass to trace_xdp_exception(). Later there could be XDP_DRIVER_* cases e.g. propagated from XDP_TX error exceptions. [...] default: err = bpf_warn_invalid_xdp_action(act); fallthrough; case XDP_ABORTED: xdp_abort: trace_xdp_exception(rq->netdev, prog, act, err); fallthrough; case XDP_DROP: lrstats->xdp_drop++; break; } [...] [1] https://lore.kernel.org/netdev/cover.1637924200.git.pabeni@redhat.com/ (local)quoted
So overall wrt this series: from the lrstats we'd be /dropping/ the pass, tx_errors, redirect_errors, invalid, aborted counters. And we'd be /keeping/ bytes & packets counters that XDP sees, (driver-)successful tx & redirect counters as well as drop counter. Also, XDP bytes & packets counters should not be counted twice wrt ethtool stats. [0] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=9e2ee5c7e7c35d195e2aa0692a7241d47a433d1e
Concrete example with mlx5: For most other hardware (than mlx5) I experience that XDP_TX creates a push-back on NIC RX-handing speed. Thus, the XDP_TX stats recorded by BPF-prog is usually correct. With mlx5 hardware (tested on ConnectX-5 Ex MT28800) the RX packets-per-sec (pps) stats can easily be faster than actually XDP_TX transmitted frames. $ sudo ./xdp_rxq_info --dev mlx5p1 --action XDP_TX [...] Running XDP on dev:mlx5p1 (ifindex:10) action:XDP_TX options:swapmac XDP stats CPU pps issue-pps XDP-RX CPU 1 13,922,430 0 XDP-RX CPU total 13,922,430 RXQ stats RXQ:CPU pps issue-pps rx_queue_index 1:1 13,922,430 0 rx_queue_index 1:sum 13,922,430 The real xmit speed is (from below ethtool_stats.pl) is 9,391,314 pps <= rx1_xdp_tx_xmit /sec The dropped packets are double accounted as: 4,552,033 <= rx_xdp_drop /sec 4,552,033 <= rx_xdp_tx_full /sec Show adapter(s) (mlx5p1) statistics (ONLY that changed!) Ethtool(mlx5p1 ) stat: 217865 ( 217,865) <= ch1_poll /sec Ethtool(mlx5p1 ) stat: 217864 ( 217,864) <= ch_poll /sec Ethtool(mlx5p1 ) stat: 13943371 ( 13,943,371) <= rx1_cache_reuse /sec Ethtool(mlx5p1 ) stat: 4552033 ( 4,552,033) <= rx1_xdp_drop /sec Ethtool(mlx5p1 ) stat: 146740 ( 146,740) <= rx1_xdp_tx_cqes /sec Ethtool(mlx5p1 ) stat: 4552033 ( 4,552,033) <= rx1_xdp_tx_full /sec Ethtool(mlx5p1 ) stat: 9391314 ( 9,391,314) <= rx1_xdp_tx_inlnw /sec Ethtool(mlx5p1 ) stat: 880436 ( 880,436) <= rx1_xdp_tx_mpwqe /sec Ethtool(mlx5p1 ) stat: 997833 ( 997,833) <= rx1_xdp_tx_nops /sec Ethtool(mlx5p1 ) stat: 9391314 ( 9,391,314) <= rx1_xdp_tx_xmit /sec Ethtool(mlx5p1 ) stat: 45095173 ( 45,095,173) <= rx_64_bytes_phy /sec Ethtool(mlx5p1 ) stat: 2886090490 ( 2,886,090,490) <= rx_bytes_phy /sec Ethtool(mlx5p1 ) stat: 13943293 ( 13,943,293) <= rx_cache_reuse /sec Ethtool(mlx5p1 ) stat: 31151957 ( 31,151,957) <= rx_out_of_buffer /sec Ethtool(mlx5p1 ) stat: 45095158 ( 45,095,158) <= rx_packets_phy /sec Ethtool(mlx5p1 ) stat: 2886072350 ( 2,886,072,350) <= rx_prio0_bytes /sec Ethtool(mlx5p1 ) stat: 45094878 ( 45,094,878) <= rx_prio0_packets /sec Ethtool(mlx5p1 ) stat: 2705707938 ( 2,705,707,938) <= rx_vport_unicast_bytes /sec Ethtool(mlx5p1 ) stat: 45095129 ( 45,095,129) <= rx_vport_unicast_packets /sec Ethtool(mlx5p1 ) stat: 4552033 ( 4,552,033) <= rx_xdp_drop /sec Ethtool(mlx5p1 ) stat: 146739 ( 146,739) <= rx_xdp_tx_cqe /sec Ethtool(mlx5p1 ) stat: 4552033 ( 4,552,033) <= rx_xdp_tx_full /sec Ethtool(mlx5p1 ) stat: 9391319 ( 9,391,319) <= rx_xdp_tx_inlnw /sec Ethtool(mlx5p1 ) stat: 880436 ( 880,436) <= rx_xdp_tx_mpwqe /sec Ethtool(mlx5p1 ) stat: 997831 ( 997,831) <= rx_xdp_tx_nops /sec Ethtool(mlx5p1 ) stat: 9391319 ( 9,391,319) <= rx_xdp_tx_xmit /sec Ethtool(mlx5p1 ) stat: 601044221 ( 601,044,221) <= tx_bytes_phy /sec Ethtool(mlx5p1 ) stat: 9391316 ( 9,391,316) <= tx_packets_phy /sec Ethtool(mlx5p1 ) stat: 601040871 ( 601,040,871) <= tx_prio0_bytes /sec Ethtool(mlx5p1 ) stat: 9391264 ( 9,391,264) <= tx_prio0_packets /sec Ethtool(mlx5p1 ) stat: 563478483 ( 563,478,483) <= tx_vport_unicast_bytes /sec Ethtool(mlx5p1 ) stat: 9391316 ( 9,391,316) <= tx_vport_unicast_packets /sec [1] https://github.com/netoptimizer/network-testing/blob/master/bin/ethtool_stats.pl The net_devices stats says the NIC is processing zero packets: $ sar -n DEV 2 1000 [...] Average: IFACE rxpck/s txpck/s rxkB/s txkB/s rxcmp/s txcmp/s rxmcst/s %ifutil [...] Average: mlx5p1 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 Average: mlx5p2 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00