Re: [PATCH] tracing: Fix tracing_marker may trigger page fault during preempt_disable

From: Mark Rutland <mark.rutland@arm.com>
Date: 2025-09-01 13:07:39
Also in: linux-arm-kernel, lkml

On Mon, Sep 01, 2025 at 01:28:01PM +0100, Catalin Marinas wrote:

On Mon, Sep 01, 2025 at 10:56:47AM +0100, Mark Rutland wrote:

quoted

On Sat, Aug 30, 2025 at 11:22:51AM +0100, Catalin Marinas wrote:

quoted

On Fri, Aug 29, 2025 at 06:13:11PM -0400, Steven Rostedt wrote:

quoted

On Fri, 29 Aug 2025 20:53:40 +0100
Catalin Marinas [off-list ref] wrote:
valid user address.

quoted

BTW, arm64 also bails out early in do_page_fault() if in_atomic() but I
suspect that's not the case here.

Adding Al Viro since since he wrote a large part of uaccess.h.

So, __copy_from_user_inatomic() is supposed to be called if
pagefault_disable() has already been called? If this is the case, can we
add more comments to this code? I've been using the inatomic() version this
way in preempt disabled locations since 2016.

This should work as long as in_atomic() returns true as it's checked in
the arm64 fault code. With PREEMPT_NONE, however, I don't think this
works.

Sorry, what exactly breaks for the PREEMPT_NONE case?

This code would trigger a warning:

	preempt_disable();
	WARN_ON(!in_atomic());
	preempt_enable();

Ah, you mean in the absence of pagefault_disable()..pagefault_enable().

The page fault handling code uses faulthandler_disabled(), which checks
whether either pagefault_disabled() or in_atomic() are true, and aborts
if either are. Given that, using pagefault_disable() should work fine on
PREEMPT_NONE.

More importantly, a faulting __copy_from_user_inatomic() between
get/put_cpu() could trigger migration.

Yep, in the absence of pagefault_disable().

quoted

I just wanted to figure out why __copy_from_user_inatomic() wasn't atomic.
If anything, it needs to be better documented.

Yeah, I had no idea until I looked at the code. I guess it means it can
be safely used if in_atomic() == true (well, making it up, not sure what
the intention was).

I think that was the intention -- it's the caller asserting that they
know the access won't fault (and hence won't sleep), and that's why
__copy_to_user_inatomic() and __copy_to_user() only differ by the latter
calling might_sleep().

It looks like other inconsistencies have crept in by accident. AFAICT
the should_fail_usercopy() check in __copy_from_user() was accidentally
missed from __copy_from_user_inatomic() when it was introduced in
commit:

I was wondering about that but some code comment for the inatomic
variant states that it's the responsibility of the caller to ensure it
doesn't fault.

I think you mean the kerneldoc comment for __copy_to_user_inatomic(),
which says:

| The caller should also make sure he pins the user space address
| so that we don't result in page fault and sleep.

... and I think the key aspect is to avoid the sleeping, and actually
taking a fault (and failing the uaccess) has to be fine, or the _nofault
API (which uses the _inatomic API) is broken by design.

I think the bit about pinning the address space is misleading.

Not sure one can do other than pinning the page _and_ taking the mm
lock. So I agree we should add the fault injection here as well.

Cool.

quoted

... so there's a bunch of scope for cleanup, and we could probably have:

	/*
	 * TODO: comment saying to only call this directly when you know
	 * that the fault handler won't sleep.
	 */
	static __always_inline __must_check unsigned long
	__copy_from_user_inatomic(void *to, const void __user *from, unsigned long n)
	{
		...
	}

	static __always_inline __must_check unsigned long
	__copy_from_user(void *to, const void __user *from, unsigned long n)
	{
		might_fault();
		return __copy_from_user_inatomic();
	}

... to avoid the inconsistency.

I think the _inatomic variant should be reserved to arch code that knows
the conditions. Generic code/drivers may not necessarily be aware of
what the arch fault handler does. The _nofault API I think is better
suited in generic code.

I agree. In almost all situations it's better for code to use the
_nofault API.

Mark.

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help