Re: [PATCH v5 4/4] ARM: Add support for Hisilicon Kunpeng L3 cache controller | linux-arm-kernel

quoted

On 2021/1/29 21:54, Leizhen (ThunderTown) wrote:

On 2021/1/29 18:26, Arnd Bergmann wrote:
On Fri, Jan 29, 2021 at 9:16 AM Arnd Bergmann [off-list ref] wrote:
On Fri, Jan 29, 2021 at 8:23 AM Leizhen (ThunderTown)
[off-list ref] wrote:
On 2021/1/28 22:24, Arnd Bergmann wrote:
On Sat, Jan 16, 2021 at 4:27 AM Zhen Lei [off-list ref] wrote:
diff --git a/arch/arm/mm/Makefile b/arch/arm/mm/Makefile
+
+static void l3cache_maint_common(u32 range, u32 op_type)
+{
+       u32 reg;
+
+       reg = readl(l3_ctrl_base + L3_MAINT_CTRL);
+       reg &= ~(L3_MAINT_RANGE_MASK | L3_MAINT_TYPE_MASK);
+       reg |= range | op_type;
+       reg |= L3_MAINT_STATUS_START;
+       writel(reg, l3_ctrl_base + L3_MAINT_CTRL);
Are there contents of L3_MAINT_CTRL that need to be preserved
across calls and can not be inferred? A 'readl()' is often expensive,
so it might be more efficient if you can avoid that.
Right, this readl() can be replaced with readl_relaxed(). Thanks.

I'll check and correct the readl() and writel() in other places.
What I meant is that if you want to replace them, you should provide
performance numbers that show how much difference this makes
and add comments in the source code explaining how you proved that
the _relaxed() version is actually correct.
Another clarification, as there are actually two independent
points here:

* if you can completely remove the readl() above and just write a
  hardcoded value into the register, or perhaps read the original
  value once at boot time, that is probably a win because it
  avoids one of the barriers in the beginning. The datasheet should
  tell you if there are any bits in the register that have to be
  preserved
Code coupling will become very strong.

* Regarding the _relaxed() accessors, it's a lot harder to know
  whether that is safe, as you first have to show, in particular in case
  any of the accesses stop being guarded by the spinlock in that
  case, and whether there may be a case where you have to
  serialize the memory access against accesses that are still in the
  store queue or prefetched.

Whether this matters at all depends mostly on the type of devices
you are driving on your SoC. If you have any high-speed network
interfaces that are unable to do cache coherent DMA, any extra
instruction here may impact the number of packets you can transfer,
but if all your high-speed devices are connected to a coherent
interconnect, I would just go with the obvious approach and use
the safe MMIO accessors everywhere.
In fact, this driver has been running on an earlier version for several years
and has not received any feedback about the performance issue. So I didn't
try to optimize it when I first sent these patches. I had to reconsider it
until you noticed it.

How about keeping it unchanged for the moment? It'll take a lot of time and
energy to retest.
In the spirit of code excellence, it's still necessary to optimize it.
Yesterday, my family urged me to go back, I wrote it in a hurry.

       Arnd

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help