Thread (14 messages) 14 messages, 4 authors, 2023-08-18

Re: ARM board lockups/hangs triggered by locks and mutexes

From: Rafał Miłecki <zajec5@gmail.com>
Date: 2023-08-18 20:24:49
Also in: linux-arm-kernel, linux-clk, lkml

On 14.08.2023 11:04, Geert Uytterhoeven wrote:
Hi Rafal,

On Mon, Aug 7, 2023 at 1:11 PM Rafał Miłecki [off-list ref] wrote:
quoted
On 4.08.2023 13:07, Rafał Miłecki wrote:
quoted
I triple checked that. Dropping a single unused function breaks kernel /
device stability on BCM53573!

AFAIK the only thing below diff actually affects is location of symbols
(I actually verified that by comparing System.map before and after -
over 22'000 of relocated symbols).

Can some unfortunate location of symbols cause those hangs/lockups?
I performed another experiment. First I dropped mtd_check_of_node() to
bring kernel back to the stable state.

Then I started adding useless code to the mtdchar_unlocked_ioctl(). I
ended up adding just enough to make sure all post-mtd symbols in
System.map got the same offset as in case of backporting
mtd_check_of_node().

I started experiencing lockups/hangs again.

I repeated the same test with adding dumb code to the brcm_nvram_probe()
and verifying symbols offsets following brcm_nvram_probe one.

I believe this confirms that this problem is about offset or alignment
of some specific symbol(s). The remaining question is what symbols and
how to fix or workaround that.
I had similar experiences on other ARM platforms many years ago:
bisection lead to something completely bogus, and it turned out
adding a single line of innocent code made the system lock-up or crash
unexpectedly.  It was definitely related to alignment, as adding the
right extra amount of innocent code would fix the problem. Until some
later change changing alignment again...
I never found the real cause, but the problems went away over time.
I am not sure I did enable all required errata config options, so I
may have missed some...
I already experiented some weird performance variations on Broadcom's
Northstar platform that was related to symbols layout & cache hit/miss
ratio. For that reason I use -falign-functions=32 for that whole
OpenWrt's "bcm53xx" target (it covers Northstar and BCM53573). So
this aspect should be ruled out already in my case.
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help