Thread (16 messages) 16 messages, 4 authors, 2025-08-22

Re: [PATCH net-next 5/6] netfilter: nft_set_pipapo: Store real pointer, adjust later.

From: Stefano Brivio <hidden>
Date: 2025-08-20 16:15:43
Also in: netfilter-devel

On Wed, 20 Aug 2025 18:01:14 +0200
Sebastian Andrzej Siewior [off-list ref] wrote:
On 2025-08-20 17:44:01 [+0200], Stefano Brivio wrote:
quoted
On Wed, 20 Aug 2025 16:47:37 +0200
Florian Westphal [off-list ref] wrote:
  
quoted
From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>

The struct nft_pipapo_scratch is allocated, then aligned to the required
alignment and difference (in bytes) is then saved in align_off. The
aligned pointer is used later.
While this works, it gets complicated with all the extra checks if
all member before map are larger than the required alignment.

Instead of saving the aligned pointer, just save the returned pointer
and align the map pointer in nft_pipapo_lookup() before using it. The
alignment later on shouldn't be that expensive.  
The cost of doing the alignment later was the very reason why I added
this whole dance in the first place though. Did you check packet
matching rates before and after this?  
how? There was something under selftest which I used to ensure it still
works.
tools/testing/selftests/net/netfilter/nft_concat_range.sh, you should add
"performance" to $TESTS (or just do TESTS=perfomance), they are normally
skipped because they take a while.
On x86 it should be two additional opcodes (and + lea) and that might be
interleaved.
I think so too, but I wonder if that has a much bigger effect on
subsequent cache loads rather than just those two instructions.
Do you remember a rule of thumb of your improvement?
I added this right away with the initial implementation of the
vectorised version, so I didn't really check the difference or record
it anywhere, but I vaguely remember having something similar to the
version with your current change in an earlier draft and it was
something like 20 cycles difference with the 'net,port' test with 1000
entries... maybe, I'm really not sure anymore.

I'm especially not sure if my old draft was equivalent to this change.
I reported the original figures (with the alignment done in advance) in
the commit message of 7400b063969b ("nft_set_pipapo: Introduce
AVX2-based lookup implementation").
As far as I remember the alignment code expects that the "hole" at the
begin does not exceed a certain size and the lock there exceeds it.
I think you're right. But again, the alignment itself should be fast,
that's not what I'm concerned about.

-- 
Stefano
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help