Thread (45 messages) 45 messages, 8 authors, 2021-11-08

Re: [PATCH net-next 03/21] ethtool, stats: introduce standard XDP statistics

From: David Ahern <hidden>
Date: 2021-08-04 16:18:06
Also in: linux-doc, lkml, netdev, virtualization

On 8/4/21 6:36 AM, Jakub Kicinski wrote:
quoted
XDP is going to always be eBPF based ! why not just report such stats
to a special BPF_MAP ? BPF stack can collect the stats from the driver
and report them to this special MAP upon user request.
Do you mean replacing the ethtool-netlink / rtnetlink etc. with
a new BPF_MAP? I don't think adding another category of uAPI thru 
which netdevice stats are exposed would do much good :( Plus it 
doesn't address the "yet another cacheline" concern.

To my understanding the need for stats recognizes the fact that (in
large organizations) fleet monitoring is done by different teams than
XDP development. So XDP team may have all the stats they need, but the
team doing fleet monitoring has no idea how to get to them.

To bridge the two worlds we need a way for the infra team to ask the
XDP for well-defined stats. Maybe we should take a page from the BPF
iterators book and create a program type for bridging the two worlds?
Called by networking core when duping stats to extract from the
existing BPF maps all the relevant stats and render them into a well
known struct? Users' XDP design can still use a single per-cpu map with
all the stats if they so choose, but there's a way to implement more
optimal designs and still expose well-defined stats.

Maybe that's too complex, IDK.
I was just explaining to someone internally how to get stats at all of
the different points in the stack to track down reasons for dropped packets:

ethtool -S for h/w and driver
tc -s for drops by the qdisc
/proc/net/softnet_stat for drops at the backlog layer
netstat -s for network and transport layer

yet another command and API just adds to the nightmare of explaining and
understanding these stats.

There is real value in continuing to use ethtool API for XDP stats. Not
saying this reorg of the XDP stats is the right thing to do, only that
the existing API has real user benefits.

Does anyone have data that shows bumping a properly implemented counter
causes a noticeable performance degradation and if so by how much? You
mention 'yet another cacheline' but collecting stats on stack and
incrementing the driver structs at the end of the napi loop should not
have a huge impact versus the value the stats provide.
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help