Thread (71 messages) 71 messages, 13 authors, 2024-06-18

Re: [PATCH net-next v10 05/14] netdev: netdevice devmem allocator

From: Steven Rostedt <rostedt@goodmis.org>
Date: 2024-06-05 00:27:44
Also in: bpf, dri-devel, linux-alpha, linux-arch, linux-doc, linux-kselftest, linux-media, linux-mips, linux-trace-kernel, lkml, sparclinux

On Wed, 5 Jun 2024 01:44:37 +0200
Andrew Lunn [off-list ref] wrote:
quoted
Interesting, as I sped up the ftrace ring buffer by a substantial amount by
adding strategic __always_inline, noinline, likely() and unlikely()
throughout the code. It had to do with what was considered the fast path
and slow path, and not actually the size of the function. gcc got it
horribly wrong.  
And what did the compiler people say when you reported gcc was getting
it wrong?

Our assumption is, the compiler is better than a human at deciding
this. Or at least, a human who does not spend a long time profiling
and tuning. If this assumption is not true, we probably should be
trying to figure out why, and improving the compiler when
possible. That will benefit everybody.
How is the compiler going to know which path is going to be taken the most?
There's two main paths in the ring buffer logic. One when an event stays on
the sub-buffer, the other when the event crosses over to a new sub buffer.
As there's 100s of events that happen on the same sub-buffer for every one
time there's a cross over, I optimized the paths that stayed on the
sub-buffer, which caused the time for those events to go from 250ns down to
150 ns!. That's a 40% speed up.

I added the unlikely/likely and 'always_inline' and 'noinline' paths to
make sure the "staying on the buffer" path was always the hot path, and
keeping it tight in cache.

How is a compiler going to know that?

-- Steve
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help