Thread (15 messages) 15 messages, 5 authors, 2020-03-02

Re: [PATCH] exec: Fix a deadlock in ptrace

From: Bernd Edlinger <hidden>
Date: 2020-03-01 17:46:18
Also in: linux-fsdevel, linux-mm, lkml

On 3/1/20 4:58 PM, Christian Brauner wrote:
On Mon, Mar 02, 2020 at 02:13:33AM +1100, Aleksa Sarai wrote:
quoted
On 2020-03-01, Bernd Edlinger [off-list ref] wrote:
quoted
This fixes a deadlock in the tracer when tracing a multi-threaded
application that calls execve while more than one thread are running.

I observed that when running strace on the gcc test suite, it always
blocks after a while, when expect calls execve, because other threads
have to be terminated.  They send ptrace events, but the strace is no
longer able to respond, since it is blocked in vm_access.

The deadlock is always happening when strace needs to access the
tracees process mmap, while another thread in the tracee starts to
execve a child process, but that cannot continue until the
PTRACE_EVENT_EXIT is handled and the WIFEXITED event is received:

strace          D    0 30614  30584 0x00000000
Call Trace:
__schedule+0x3ce/0x6e0
schedule+0x5c/0xd0
schedule_preempt_disabled+0x15/0x20
__mutex_lock.isra.13+0x1ec/0x520
__mutex_lock_killable_slowpath+0x13/0x20
mutex_lock_killable+0x28/0x30
mm_access+0x27/0xa0
process_vm_rw_core.isra.3+0xff/0x550
process_vm_rw+0xdd/0xf0
__x64_sys_process_vm_readv+0x31/0x40
do_syscall_64+0x64/0x220
entry_SYSCALL_64_after_hwframe+0x44/0xa9

expect          D    0 31933  30876 0x80004003
Call Trace:
__schedule+0x3ce/0x6e0
schedule+0x5c/0xd0
flush_old_exec+0xc4/0x770
load_elf_binary+0x35a/0x16c0
search_binary_handler+0x97/0x1d0
__do_execve_file.isra.40+0x5d4/0x8a0
__x64_sys_execve+0x49/0x60
do_syscall_64+0x64/0x220
entry_SYSCALL_64_after_hwframe+0x44/0xa9

The proposed solution is to have a second mutex that is
used in mm_access, so it is allowed to continue while the
dying threads are not yet terminated.

I also took the opportunity to improve the documentation
of prepare_creds, which is obviously out of sync.

Signed-off-by: Bernd Edlinger <redacted>
I can't comment on the validity of the patch, but I also found and
reported this issue in 2016[1] and the discussion quickly veered into
the problem being more complicated (and uglier) than it seems at first
glance.

You should probably also Cc stable, given this has been a long-standing
issue and your patch doesn't look (too) invasive.

[1]: https://lore.kernel.org/lkml/20160921152946.GA24210@dhcp22.suse.cz/ (local)
Yeah, I remember you mentioning this a while back.

Bernd, we really want a reproducer for this sent alongside with this
patch added to:
tools/testing/selftests/ptrace/
Having a test for this bug irrespective of whether or not we go with
this as fix seems really worth it.
I ran into this issue, because I wanted to fix an issue in the gcc testsuite,
namely why it forgets to remove some temp files,
so I did the following:

strace -ftt -o trace.txt make check-gcc-c -k -j4

I reproduced with v4.20 and v5.5 kernel, and I don't know why but it is
not happening on all systems I tested, maybe it is something that the expect program
does, because, always when I try to reproduce this, the deadlock was always in "expect".

I use expect version 5.45 on the computer where the above test freezes after
a couple of minutes.

I think the issue with strace is that it is using vm_access to get the parameters
of a syscall that is going on in one thread, and that races with another thread
that calls execve, and blocks the cred_guard_mutex.

While Olg's test case here, will certainly not be fixed:

https://lore.kernel.org/lkml/20160923095031.GA14923@redhat.com/ (local)

he mentions the access to "anything else which needs ->cred_guard_mutex,
say open(/proc/$pid/mem)", I don't know for sure how that can be done, but if
that is possible, it would probably work as a test case.

What do you think?


Bernd.

Oleg seems to have suggested that a potential alternative fix is to wait
in de_thread() until all other threads in the thread-group have passed
exit_notiy(). Right now we only kill them but don't wait. Currently
de_thread() only waits for the thread-group leader to pass exit_notify()
whenever a non-thread-group leader thread execs (because the exec'ing
thread becomes the new thread-group leader with the same pid as the
former thread-group leader).

Christian
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help