Re: [BUG] arm64: an infinite loop in generic_perform_write()
From: Matthew Wilcox <willy@infradead.org>
Date: 2021-06-23 11:52:39
Also in:
linux-arm-kernel, lkml
On Wed, Jun 23, 2021 at 10:32:21AM +0100, Catalin Marinas wrote:
On Wed, Jun 23, 2021 at 04:27:37AM +0000, Al Viro wrote:quoted
On Wed, Jun 23, 2021 at 11:24:54AM +0800, Xiaoming Ni wrote:quoted
On 2021/6/23 10:50, Al Viro wrote:quoted
On Wed, Jun 23, 2021 at 10:39:31AM +0800, Chen Huang wrote:quoted
Then when kernel handles the alignment_fault, it will not panic. As the arm64 memory model spec said, when the address is not a multiple of the element size, the access is unaligned. Unaligned accesses are allowed to addresses marked as Normal, but not to Device regions. An unaligned access to a Device region will trigger an exception (alignment fault). do_alignment_fault do_bad_area __do_kernel_fault fixup_exception But that fixup cann't handle the unaligned copy, so the copy_page_from_iter_atomic returns 0 and traps in loop.Looks like you need to fix your raw_copy_from_user(), then...Exit loop when iov_iter_copy_from_user_atomic() returns 0. This should solve the problem, too, and it's easier.It might be easier, but it's not going to work correctly. If the page gets evicted by memory pressure, you are going to get spurious short write. Besides, it's simply wrong - write(2) does *NOT* require an aligned source. It (and raw_copy_from_user()) should act the same way memcpy(3) does.On arm64, neither memcpy() nor raw_copy_from_user() are expected to work on Device mappings, we have memcpy_fromio() for this but only for ioremap(). There's no (easy) way to distinguish in the write() syscall how the source buffer is mapped. generic_perform_write() does an iov_iter_fault_in_readable() check but that's not sufficient and it also breaks the cases where you can get intra-page faults (arm64 MTE or SPARC ADI). I think in the general case it's racy anyway (another thread doing an mprotect(PROT_NONE) after the readable check passed). So I think generic_perform_write() returning -EFAULT if copied == 0 would make sense (well, unless it breaks other cases I'm not aware of).
It does break other cases -- that's what happens if the page has gone missing after being faulted in. You need to fix your copy_from_user().