Re: [PATCH] tracing: Fix tracing_marker may trigger page fault during preempt_disable

From: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Date: 2025-09-02 07:35:17
Also in: linux-trace-kernel, lkml

On Tue, 2 Sep 2025 11:47:32 +0800
Luo Gengkun [off-list ref] wrote:

On 2025/9/1 23:56, Masami Hiramatsu (Google) wrote:

quoted

On Fri, 29 Aug 2025 08:26:04 -0400
Steven Rostedt [off-list ref] wrote:

quoted

[ Adding arm64 maintainers ]

On Fri, 29 Aug 2025 16:29:07 +0800
Luo Gengkun [off-list ref] wrote:

quoted

On 2025/8/20 1:50, Steven Rostedt wrote:

quoted

On Tue, 19 Aug 2025 10:51:52 +0000
Luo Gengkun [off-list ref] wrote:

quoted

Both tracing_mark_write and tracing_mark_raw_write call
__copy_from_user_inatomic during preempt_disable. But in some case,
__copy_from_user_inatomic may trigger page fault, and will call schedule()
subtly. And if a task is migrated to other cpu, the following warning will

Wait! What?

__copy_from_user_inatomic() is allowed to be called from in atomic context.
Hence the name it has. How the hell can it sleep? If it does, it's totally
broken!

Now, I'm not against using nofault() as it is better named, but I want to
know why you are suggesting this change. Did you actually trigger a bug here?

yes, I trigger this bug in arm64.

And I still think this is an arm64 bug.

I think it could be.

quoted

be trigger:
          if (RB_WARN_ON(cpu_buffer,
                         !local_read(&cpu_buffer->committing)))

An example can illustrate this issue:

You've missed an important part.

quoted

process flow						CPU
---------------------------------------------------------------------

tracing_mark_raw_write():				cpu:0
     ...
     ring_buffer_lock_reserve():				cpu:0
        ...

	preempt_disable_notrace(); --> this is unlocked by ring_buffer_unlock_commit()

quoted

        cpu = raw_smp_processor_id()			cpu:0
        cpu_buffer = buffer->buffers[cpu]			cpu:0
        ...
     ...
     __copy_from_user_inatomic():				cpu:0

So this is called under preempt-disabled.

quoted

        ...
        # page fault
        do_mem_abort():					cpu:0

Sounds to me that arm64 __copy_from_user_inatomic() may be broken.

quoted

           ...
           # Call schedule
           schedule()					cpu:0

If this does not check the preempt flag, it is a problem.
Maybe arm64 needs to do fixup and abort instead of do_mem_abort()?

My kernel was built without CONFIG_PREEMPT_COUNT, so the preempt_disable()
does nothing more than act as a barrier. In this case, it can pass the
check by schedule(). Perhaps this is another issue?

OK, I got it. Indeed, in that case, we have no way to check this
happens in the preempt critical section.
Anyway, as in discussed here, __copy_from_user_inatomic() is for
the internal function, so I'm also OK to this patch.

Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>

BTW, currently we just write a fault message if the
__copy_from_user_*() hits a fault, but I think we can retry with
normal __copy_from_user() to a kernel buffer and copy it in the
ring buffer as slow path.

Thank you,

-- 
Masami Hiramatsu (Google) [off-list ref]

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help