Re: [PATCH RFC 4/4] arm64/io: Add {__raw_read|__raw_write}128 support
From: Mark Rutland <mark.rutland@arm.com>
Date: 2025-11-12 14:17:56
Also in:
linux-arch, linux-arm-kernel, linux-crypto, lkml
On Wed, Nov 12, 2025 at 02:01:57PM +0000, David Laight wrote:
On Wed, 12 Nov 2025 12:28:01 +0000 Mark Rutland [off-list ref] wrote:quoted
On Wed, Nov 12, 2025 at 09:58:46AM +0800, Chenghai Huang wrote:quoted
From: Weili Qian <qianweili@huawei.com> Starting from ARMv8.4, stp and ldp instructions become atomic.That's not true for accesses to Device memory types. Per ARM DDI 0487, L.b, section B2.2.1.1 ("Changes to single-copy atomicity in Armv8.4"): If FEAT_LSE2 is implemented, LDP, LDNP, and STP instructions that load or store two 64-bit registers are single-copy atomic when all of the following conditions are true: • The overall memory access is aligned to 16 bytes. • Accesses are to Inner Write-Back, Outer Write-Back Normal cacheable memory. IIUC when used for Device memory types, those can be split, and a part of the access could be replayed multiple times (e.g. due to an intetrupt).That can't be right.
For better or worse, the architecture permits this, and I understand that there are implementations on which this can happen.
IO accesses can reference hardware FIFO so must only happen once.
This has nothing to do with the endpoint, and so any FIFO in the endpoint is immaterial. I agree that we want to ensure that the accesses only happen once, which is why I have raised that it is unsound to use LDP/LDNP/STP in this way.
(Or is 'Device memory' something different from 'Device register'?
I specifically said "Device memory type", which is an attribute that the MMU associates with a VA, and determines how the MMU (and memory system as a whole) treats accesses to that VA. You can find the architecture documentation I referenced at: https://developer.arm.com/documentation/ddi0487/lb/
I'm also not sure that the bus cycles could get split by an interrupt, that would require a mid-instruction interrupt - very unlikely.
There are various reasons why an implementation might split the accesses made by a single instruction, and why an interrupt (or other event) might occur between accesses and cause a replay of some of the constituent accesses. This has nothing to do with splitting bus cycles. Mark.