Re: [PATCH RFC v3 00/10] coredump: add coredump socket
From: Christian Brauner <brauner@kernel.org>
Date: 2025-05-05 14:56:10
Also in:
linux-fsdevel, linux-security-module, lkml
On Mon, May 05, 2025 at 04:41:28PM +0200, Mickaël Salaün wrote:
On Mon, May 05, 2025 at 01:13:38PM +0200, Christian Brauner wrote:quoted
Coredumping currently supports two modes: (1) Dumping directly into a file somewhere on the filesystem. (2) Dumping into a pipe connected to a usermode helper process spawned as a child of the system_unbound_wq or kthreadd. For simplicity I'm mostly ignoring (1). There's probably still some users of (1) out there but processing coredumps in this way can be considered adventurous especially in the face of set*id binaries. The most common option should be (2) by now. It works by allowing userspace to put a string into /proc/sys/kernel/core_pattern like: |/usr/lib/systemd/systemd-coredump %P %u %g %s %t %c %h The "|" at the beginning indicates to the kernel that a pipe must be used. The path following the pipe indicator is a path to a binary that will be spawned as a usermode helper process. Any additional parameters pass information about the task that is generating the coredump to the binary that processes the coredump. In the example core_pattern shown above systemd-coredump is spawned as a usermode helper. There's various conceptual consequences of this (non-exhaustive list): - systemd-coredump is spawned with file descriptor number 0 (stdin) connected to the read-end of the pipe. All other file descriptors are closed. That specifically includes 1 (stdout) and 2 (stderr). This has already caused bugs because userspace assumed that this cannot happen (Whether or not this is a sane assumption is irrelevant.). - systemd-coredump will be spawned as a child of system_unbound_wq. So it is not a child of any userspace process and specifically not a child of PID 1. It cannot be waited upon and is in a weird hybrid upcall which are difficult for userspace to control correctly. - systemd-coredump is spawned with full kernel privileges. This necessitates all kinds of weird privilege dropping excercises in userspace to make this safe. - A new usermode helper has to be spawned for each crashing process. This series adds a new mode: (3) Dumping into an abstract AF_UNIX socket. Userspace can set /proc/sys/kernel/core_pattern to: @linuxafsk/coredump_socket The "@" at the beginning indicates to the kernel that the abstract AF_UNIX coredump socket will be used to process coredumps. The coredump socket uses the fixed address "linuxafsk/coredump.socket" for now. The coredump socket is located in the initial network namespace. To bind the coredump socket userspace must hold CAP_SYS_ADMIN in the initial user namespace. Listening and reading can happen from whatever unprivileged context is necessary to safely process coredumps. When a task coredumps it opens a client socket in the initial network namespace and connects to the coredump socket. For now only tasks that are acctually coredumping are allowed to connect to the initial coredump socket.I think we should avoid using abstract UNIX sockets, especially for new
Abstract unix sockets are at the core of a modern Linux system. During boot alone about 100 or so are created on a modern system when I counted during testing. Sorry, but this is a no-show argument.