Re: [PATCH v5 4/4] ARM: Add support for Hisilicon Kunpeng L3 cache controller
From: Leizhen (ThunderTown) <hidden>
Date: 2021-01-30 09:51:46
Also in:
linux-devicetree, lkml
On 2021/1/29 21:54, Leizhen (ThunderTown) wrote:
On 2021/1/29 18:26, Arnd Bergmann wrote:quoted
On Fri, Jan 29, 2021 at 9:16 AM Arnd Bergmann [off-list ref] wrote:quoted
On Fri, Jan 29, 2021 at 8:23 AM Leizhen (ThunderTown) [off-list ref] wrote:quoted
On 2021/1/28 22:24, Arnd Bergmann wrote:quoted
On Sat, Jan 16, 2021 at 4:27 AM Zhen Lei [off-list ref] wrote:quoted
diff --git a/arch/arm/mm/Makefile b/arch/arm/mm/Makefile + +static void l3cache_maint_common(u32 range, u32 op_type) +{ + u32 reg; + + reg = readl(l3_ctrl_base + L3_MAINT_CTRL); + reg &= ~(L3_MAINT_RANGE_MASK | L3_MAINT_TYPE_MASK); + reg |= range | op_type; + reg |= L3_MAINT_STATUS_START; + writel(reg, l3_ctrl_base + L3_MAINT_CTRL);Are there contents of L3_MAINT_CTRL that need to be preserved across calls and can not be inferred? A 'readl()' is often expensive, so it might be more efficient if you can avoid that.Right, this readl() can be replaced with readl_relaxed(). Thanks. I'll check and correct the readl() and writel() in other places.What I meant is that if you want to replace them, you should provide performance numbers that show how much difference this makes and add comments in the source code explaining how you proved that the _relaxed() version is actually correct.Another clarification, as there are actually two independent points here: * if you can completely remove the readl() above and just write a hardcoded value into the register, or perhaps read the original value once at boot time, that is probably a win because it avoids one of the barriers in the beginning. The datasheet should tell you if there are any bits in the register that have to be preservedCode coupling will become very strong.quoted
* Regarding the _relaxed() accessors, it's a lot harder to know whether that is safe, as you first have to show, in particular in case any of the accesses stop being guarded by the spinlock in that case, and whether there may be a case where you have to serialize the memory access against accesses that are still in the store queue or prefetched. Whether this matters at all depends mostly on the type of devices you are driving on your SoC. If you have any high-speed network interfaces that are unable to do cache coherent DMA, any extra instruction here may impact the number of packets you can transfer, but if all your high-speed devices are connected to a coherent interconnect, I would just go with the obvious approach and use the safe MMIO accessors everywhere.In fact, this driver has been running on an earlier version for several years and has not received any feedback about the performance issue. So I didn't try to optimize it when I first sent these patches. I had to reconsider it until you noticed it. How about keeping it unchanged for the moment? It'll take a lot of time and energy to retest.
In the spirit of code excellence, it's still necessary to optimize it. Yesterday, my family urged me to go back, I wrote it in a hurry.
quoted
Arnd _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel_______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel .
_______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel