Re: [PATCH 1/2] media: rkvdec: reduce excessive stack usage in assemble_hw_pps()
From: "Arnd Bergmann" <arnd@arndb.de>
Date: 2026-02-02 14:09:36
Also in:
linux-media, linux-rockchip, lkml, llvm
On Mon, Feb 2, 2026, at 14:42, Nicolas Dufresne wrote:
Le lundi 02 février 2026 à 10:47 +0100, Arnd Bergmann a écrit :quoted
From: Arnd Bergmann <arnd@arndb.de> The rkvdec_pps had a large set of bitfields, all of which as misaligned. This causes clang-21 and likely other versions to produce absolutely awful object code and a warning about very large stack usage, on targets without unaligned access: drivers/media/platform/rockchip/rkvdec/rkvdec-vp9.c:966:12: error: stack frame size (1472) exceeds limit (1280) in 'rkvdec_vp9_start' [-Werror,-Wframe-larger-than]We had already addressed and validated that on clang-21, which indicates me that we likely are missing an architecture (or a config) in our CI. Can you document which architecture, configuration and flags was affected so we can add it on our side ? Our media pipeline before sending to Linus and the clang builds trace are in the following link, in case it matters. https://gitlab.freedesktop.org/linux-media/media-committers/-/pipelines/1588731 https://gitlab.freedesktop.org/linux-media/media-committers/-/jobs/91604655
The configuration that hit this for me was an ARMv7-M NOMMU build. I'm doing 'randconfig' builds here, so I inevitably hit some corner cases that all deterministic CI systems miss. I don't think that you should add ARMv7-M here, since that would take up useful build resources from something more important. There are no drviers/media/ actual users on ARMv7-M, and next time it is going to be something else.
quoted
Part of the problem here is how all the bitfield accesses are inlined into a function that already has large structures on the stack.Another observation is that you had to enable ASAN to make it miss-behave on for loop unrolling (with complex bitfield writes). All I've obtained by visiting the Link: is that its armv7-a architecture.
Right, this randconfig build likely got closer to the warning limit because of the inherent overhead in KASAN, but the problem with the unaligned bitfields was something that I could later reproduce without KASAN, on ARMv5 and MIPS32r2. This is something we should fix in clang.
quoted
Mark set_field_order_cnt() as noinline_for_stack, and split out the following accesses in assemble_hw_pps() into another noinline function, both of which now using around 800 bytes of stack in the same configuration. There is clearly still something wrong with clang here, but splitting it into multiple functions reduces the risk of stack overflow.We've tried really hard to avoid this noninline_for_stack just because compilers are buggy. I'll have a look again in case I find some ideas, but meanwhile, with failing architecture in the commit message: Reviewed-by: Nicolas Dufresne <nicolas.dufresne@collabora.com>
Thanks!
Arnd