Re: [PATCH net-next v3 4/4] net: ocelot: add FDMA support
From: Vladimir Oltean <vladimir.oltean@nxp.com>
Date: 2021-12-03 11:23:58
Also in:
lkml, netdev
On Wed, Dec 01, 2021 at 10:29:26AM +0100, Clément Léger wrote:
Le Mon, 29 Nov 2021 17:40:39 +0000, Vladimir Oltean [off-list ref] a écrit :quoted
On Mon, Nov 29, 2021 at 09:19:02AM +0100, Clément Léger wrote:quoted
quoted
I'm not sure why you're letting the hardware grind to a halt first, before refilling? I think since the CPU is the bottleneck anyway, you can stop the extraction channel at any time you want to refill. A constant stream of less data might be better than a bursty one. Or maybe I'm misunderstanding some of the details of the hardware.Indeed, I can stop the extraction channel but that does not seems a good idea to stop the channel in a steady state. At least that's what I thought since it will make the receive "window" non predictable. Not sure how well it will play with various protocol but I will try implementing the refill we talked previously (ie when there an available threshold is reached).(...)quoted
quoted
I don't understand why you restart the injection channel from the TX confirmation interrupt. It raised the interrupt to tell you that it hit a NULL LLP because there's nothing left to send. If you restart it now and no other transmission has happened in the meantime, won't it stop again?Actually, it is only restarted if there is some pending packets to send. With this hardware, packets can't be added while the FDMA is running and it must be stopped everytime we want to add a packet to the list. To avoid that, in the TX path, if the FDMA is stopped, we set the llp of the packet to NULL and start the chan. However, if the FDMA TX channel is running, we don't stop it, we simply add the next packets to the ring. However, the FDMA will stop on the previous NULL LLP. So when we hit a LLP, we might not be at the end of the list. This is why the next check verifies if we hit a NULL LLP and if there is still some packet to send.Oh, is that so? That would be pretty odd if the hardware is so dumb that it doesn't detect changes made to an LLP on the go. The manual has this to say, and I'm not sure how to interpret it: | It is possible to update an active channels LLP pointer and pointers in | the DCB chains. Before changing pointers software must schedule the | channel for disabling (by writing FDMA_CH_DISABLE.CH_DISABLE[ch]) and | then wait for the channel to set FDMA_CH_SAFE.CH_SAFE[ch]. When the | pointer update is complete, soft must re-activate the channel by setting | FDMA_CH_ACTIVATE.CH_ACTIVATE[ch]. Setting activate will cancel the | deactivate-request, or if the channel has disabled itself in the | meantime, it will re activate the channel. So it is possible to update an active channel's LLP pointer, but not while it's active? Thank you very much!In the manual, this is also stated that: | The FDMA does not reload the current DCB when re- activated, | so if the LLP-field of the current DCB is modified, then software must | also modify FDMA_DCB_LLP[ch]. The FDMA present on the next generation (sparx5) is *almost* the same but a new RELOAD register has been added and allows adding a DCB at the end of the linked list without stopping the FDMA, and then simply hit the RELOAD register to restart it if needed. Unfortunately, this is not the case for the ocelot one.quoted
If true, this will severely limit the termination performance one is able to obtain with this switch, even with a faster CPU and PCIe.
Sadly I don't have the time or hardware to dig deeper into this, so I'll have to trust you, even if it sounds like a severe limitation.
quoted
quoted
quoted
quoted
+void ocelot_fdma_netdev_init(struct ocelot_fdma *fdma, struct net_device *dev) +{ + dev->needed_headroom = OCELOT_TAG_LEN; + dev->needed_tailroom = ETH_FCS_LEN;The needed_headroom is in no way specific to FDMA, right? Why aren't you doing it for manual register-based injection too? (in a separate patch ofc)Actually, If I switch to page based ring, This won't be useful anymore because the header will be written directly in the page and not anymore directly in the skb header.I don't understand this comment. You set up the needed headroom and tailroom netdev variables to avoid reallocation on TX, not for RX. And you use half page buffers for RX, not for TX.Ok, so indeed, I don't think it is needed for the register-based injection since the IFH is computed on the stack and pushed word by word into the fifo separately from the skb data. In the case of the FDMA, it is read from the start of the DCB DATAL adress so this is why this is needed. I could also put the IFH in a separate DCB and then split the data in a next DCB using SOF/EOF flags but I'm not sure it will be beneficial from a performance point of view. I could try that since the CPU is slow, it might be better in some case to let the FDMA handle this instead of usign the CPU to increase the SKB size and linearize it.quoted
quoted
quoted
I can't help but think how painful it is that with a CPU as slow as yours, insult over injury, you also need to check for each packet whether the device tree had defined the "fdma" region or not, because you practically keep two traffic I/O implementations due to that sole reason. I think for the ocelot switchdev driver, which is strictly for MIPS CPUs embedded within the device, it should be fine to introduce a static key here (search for static_branch_likely in the kernel).I thinked about it *but* did not wanted to add a key since it would be global. However, we could consider that there is always only one instance of the driver and indeed a static key is an option. Unfortunately, I'm not sure this will yield any noticeable performance improvement.What is the concern with a static key in this driver, exactly?Only that the static key will be global but this driver does not have anything global. If you have no concern about that, I'm ok to add one.
I don't see a downside to this, do you? Even if we get support for a PCIe ocelot driver later on, we don't know how that is going to look, if it's going to reuse the exact same xmit function, etc.