Thread (27 messages) 27 messages, 5 authors, 2021-08-21

Re: [PATCH] coredump: Limit what can interrupt coredumps

From: Olivier Langlois <hidden>
Date: 2021-08-05 13:24:27
Also in: linux-fsdevel, lkml
Subsystem: filesystems (vfs and infrastructure), the rest · Maintainers: Alexander Viro, Christian Brauner, Linus Torvalds

Possibly related (same subject, not in this thread)

On Tue, 2021-06-15 at 17:08 -0500, Eric W. Biederman wrote:
Oleg Nesterov [off-list ref] writes:
quoted
quoted
--- a/fs/coredump.c
+++ b/fs/coredump.c
@@ -519,7 +519,7 @@ static bool dump_interrupted(void)
         * but then we need to teach dump_write() to restart and
clear
         * TIF_SIGPENDING.
         */
-       return signal_pending(current);
+       return fatal_signal_pending(current) || freezing(current);
 }

Well yes, this is what the comment says.

But note that there is another reason why dump_interrupted() returns
true
if signal_pending(), it assumes thagt __dump_emit()->__kernel_write()
may
fail anyway if signal_pending() is true. Say, pipe_write(), or iirc
nfs,
perhaps something else...

That is why zap_threads() clears TIF_SIGPENDING. Perhaps it should
clear
TIF_NOTIFY_SIGNAL as well and we should change io-uring to not abuse
the
dumping threads?

Or perhaps we should change __dump_emit() to clear signal_pending()
and
restart __kernel_write() if it fails or returns a short write.

Otherwise the change above doesn't look like a full fix to me.
Agreed.  The coredump to a pipe will still be short.  That needs
something additional.

The problem Olivier Langlois [off-list ref] reported was
core dumps coming up short because TIF_NOTIFY_SIGNAL was being
set during a core dump.

We can see this with pipe_write returning -ERESTARTSYS
on a full pipe if signal_pending which includes TIF_NOTIFY_SIGNAL
is true.

Looking further if the thread that is core dumping initiated
any io_uring work then io_ring_exit_work will use task_work_add
to request that thread clean up it's io_uring state.

Perhaps we can put a big comment in dump_emit and if we
get back -ERESTARTSYS run tracework_notify_signal.  I am not
seeing any locks held at that point in the coredump, so it
should be safe.  The coredump is run inside of file_start_write
which is the only potential complication.



The code flow is complicated but it looks like the entire
point of the exercise is to call io_uring_del_task_file
on the originating thread.  I suppose that keeps the
locking of the xarray in io_uring_task simple.


Hmm.   All of this comes from io_uring_release.
How do we get to io_uring_release?  The coredump should
be catching everything in exit_mm before exit_files?

Confused and hopeful someone can explain to me what is going on,
and perhaps simplify it.

Eric
Hi all,

I didn't forgot about this remaining issue and I have kept thinking
about it on and off.

I did try the following on 5.12.19:
diff --git a/fs/coredump.c b/fs/coredump.c
index 07afb5ddb1c4..614fe7a54c1a 100644
--- a/fs/coredump.c
+++ b/fs/coredump.c
@@ -41,6 +41,7 @@
 #include <linux/fs.h>
 #include <linux/path.h>
 #include <linux/timekeeping.h>
+#include <linux/io_uring.h>
 
 #include <linux/uaccess.h>
 #include <asm/mmu_context.h>
@@ -625,6 +626,8 @@ void do_coredump(const kernel_siginfo_t *siginfo)
 		need_suid_safe = true;
 	}
 
+	io_uring_files_cancel(current->files);
+
 	retval = coredump_wait(siginfo->si_signo, &core_state);
 	if (retval < 0)
 		goto fail_creds;
-- 
2.32.0

with my current understanding, io_uring_files_cancel is supposed to
cancel everything that might set the TIF_NOTIFY_SIGNAL.

I must report that in my testing with generating a core dump through a
pipe with the modif above, I still get truncated core dumps.

systemd is having a weird error:
[ 2577.870742] systemd-coredump[4056]: Failed to get COMM: No such
process

and nothing is captured

so I have replaced it with a very simple shell:
$ cat /proc/sys/kernel/core_pattern 
|/home/lano1106/bin/pipe_core.sh %e %p

~/bin $ cat pipe_core.sh 
#!/bin/sh

cat > /home/lano1106/core/core.$1.$2

BFD: warning: /home/lano1106/core/core.test.10886 is truncated:
expected core file size >= 24129536, found: 61440

I conclude from my attempt that maybe io_uring_files_cancel is not 100%
cleaning everything that it should clean.

Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help