RE: [EXT] Re: [PATCH net-next] net: ethernet: fec: prevent tx starvation... | netdev

RE: [EXT] Re: [PATCH net-next] net: ethernet: fec: prevent tx starvation under high rx load

From: "Tobias Waldekranz" <tobias@waldekranz.com>
Date: 2020-06-30 07:45:35

On Tue Jun 30, 2020 at 8:27 AM CEST, Andy Duan wrote:

From: Tobias Waldekranz <tobias@waldekranz.com> Sent: Tuesday, June 30,
2020 12:29 AM

quoted

On Sun Jun 28, 2020 at 8:23 AM CEST, Andy Duan wrote:

quoted

I never seem bandwidth test cause netdev watchdog trip.
Can you describe the reproduce steps on the commit, then we can
reproduce it on my local. Thanks.

My setup uses a i.MX8M Nano EVK connected to an ethernet switch, but can
get the same results with a direct connection to a PC.

On the iMX, configure two VLANs on top of the FEC and enable IPv4
forwarding.

On the PC, configure two VLANs and put them in different namespaces. From
one namespace, use trafgen to generate a flow that the iMX will route from
the first VLAN to the second and then back towards the second namespace on
the PC.

Something like:

    {
        eth(sa=PC_MAC, da=IMX_MAC),
        ipv4(saddr=10.0.2.2, daddr=10.0.3.2, ttl=2)
        udp(sp=1, dp=2),
        "Hello world"
    }

Wait a couple of seconds and then you'll see the output from fec_dump.

In the same setup I also see a weird issue when running a TCP flow using
iperf3. Most of the time (~70%) when i start the iperf3 client I'll see
~450Mbps of throughput. In the other case (~30%) I'll see ~790Mbps. The
system is "stably bi-modal", i.e. whichever rate is reached in the beginning is
then sustained for as long as the session is kept alive.

I've inserted some tracepoints in the driver to try to understand what's going
on:
https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fsvgsha
re.com%2Fi%2FMVp.svg&amp;data=02%7C01%7Cfugang.duan%40nxp.com%
7C12854e21ea124b4cc2e008d81c59d618%7C686ea1d3bc2b4c6fa92cd99c5c
301635%7C0%7C0%7C637290519453656013&amp;sdata=by4ShOkmTaRkFfE
0xJkrTptC%2B2egFf9iM4E5hx4jiSU%3D&amp;reserved=0

What I can't figure out is why the Tx buffers seem to be collected at a much
slower rate in the slow case (top in the picture). If we fall behind in one NAPI
poll, we should catch up at the next call (which we can see in the fast case).
But in the slow case we keep falling further and further behind until we freeze
the queue. Is this something you've ever observed? Any ideas?

Before, our cases don't reproduce the issue, cpu resource has better
bandwidth
than ethernet uDMA then there have chance to complete current NAPI. The
next,
work_tx get the update, never catch the issue.

It appears it has nothing to do with routing back out through the same
interface.

I get the same bi-modal behavior if just run the iperf3 server on the
iMX and then have it be the transmitting part, i.e. on the PC I run:

    iperf3 -c $IMX_IP -R

I would be very interesting to see what numbers you see in this
scenario.

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help