Thread (17 messages) 17 messages, 3 authors, 2023-01-31

Re: [PATCH] selftests: net: udpgso_bench_tx: Introduce exponential back-off retries

From: Andrei Gherzan <hidden>
Date: 2023-01-30 16:23:20
Also in: linux-kselftest, lkml

On 23/01/30 04:15PM, Andrei Gherzan wrote:
On 23/01/30 11:03AM, Willem de Bruijn wrote:
quoted
On Mon, Jan 30, 2023 at 9:28 AM Andrei Gherzan
[off-list ref] wrote:
quoted
On 23/01/30 08:35AM, Willem de Bruijn wrote:
quoted
On Mon, Jan 30, 2023 at 7:51 AM Andrei Gherzan
[off-list ref] wrote:
quoted
On 23/01/30 09:26AM, Paolo Abeni wrote:
quoted
On Fri, 2023-01-27 at 17:03 -0500, Willem de Bruijn wrote:
quoted
On Fri, Jan 27, 2023 at 1:16 PM Andrei Gherzan
[off-list ref] wrote:
quoted
The tx and rx test programs are used in a couple of test scripts including
"udpgro_bench.sh". Taking this as an example, when the rx/tx programs
are invoked subsequently, there is a chance that the rx one is not ready to
accept socket connections. This racing bug could fail the test with at
least one of the following:

./udpgso_bench_tx: connect: Connection refused
./udpgso_bench_tx: sendmsg: Connection refused
./udpgso_bench_tx: write: Connection refused

This change addresses this by adding routines that retry the socket
operations with an exponential back off algorithm from 100ms to 2s.

Fixes: 3a687bef148d ("selftests: udp gso benchmark")
Signed-off-by: Andrei Gherzan <redacted>
Synchronizing the two processes is indeed tricky.

Perhaps more robust is opening an initial TCP connection, with
SO_RCVTIMEO to bound the waiting time. That covers all tests in one
go.
Another option would be waiting for the listener(tcp)/receiver(udp)
socket to show up in 'ss' output before firing-up the client - quite
alike what mptcp self-tests are doing.
I like this idea. I have tested it and it works as expected with the
exeception of:

./udpgso_bench_tx: sendmsg: No buffer space available

Any ideas on how to handle this? I could retry and that works.
This happens (also) without the zerocopy flag, right? That

It might mean reaching the sndbuf limit, which can be adjusted with
SO_SNDBUF (or SO_SNDBUFFORCE if CAP_NET_ADMIN). Though I would not
expect this test to bump up against that limit.

A few zerocopy specific reasons are captured in
https://www.kernel.org/doc/html/latest/networking/msg_zerocopy.html#transmission.
I have dug a bit more into this, and it does look like your hint was in
the right direction. The fails I'm seeing are only with the zerocopy
flag.

From the reasons (doc) above I can only assume optmem limit as I've
reproduced it with unlimited locked pages and the fails are transient.
That leaves optmem limit. Bumping the value I have by default (20480) to
(2048000) made the sendmsg succeed as expected. On the other hand, the
tests started to fail with something like:

./udpgso_bench_tx: Unexpected number of Zerocopy completions:    774783
expected    773707 received
More zerocopy completions than number of sends. I have not seen this before.

The completions are ranges of IDs, one per send call for datagram sockets.

Even with segmentation offload, the counter increases per call, not per segment.

Do you experience this without any other changes to udpgso_bench_tx.c.
Or are there perhaps additional sendmsg calls somewhere (during
initial sync) that are not accounted to num_sends?
Indeed, that looks off. No, I have run into this without any changes in
the tests (besides the retry routine in the shell script that waits for
rx to come up). Also, as a data point.
Actually wait. I don't think that is the case here. "expected" is the
number of sends. In this case we sent 1076 more messages than
completions. Am I missing something obvious?
As an additional data point, this was only seen on the IPv6 tests. I've
never been able to replicate it on the IPv4 run.
I was also fast to send this but it is not correct. I managed to
reproduce it on both IPv4 and IPv6.

-- 
Andrei Gherzan
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help