Re: [PATCH v6 4/9] coredump: add coredump socket

From: Luca Boccassi <hidden>
Date: 2025-05-13 01:09:39
Also in: linux-fsdevel, linux-security-module, lkml

On Tue, 13 May 2025 at 01:18, Kuniyuki Iwashima [off-list ref] wrote:

From: Luca Boccassi <redacted>
Date: Mon, 12 May 2025 11:58:54 +0100

quoted

On Mon, 12 May 2025 at 09:56, Christian Brauner [off-list ref] wrote:

quoted

Coredumping currently supports two modes:

(1) Dumping directly into a file somewhere on the filesystem.
(2) Dumping into a pipe connected to a usermode helper process
    spawned as a child of the system_unbound_wq or kthreadd.

For simplicity I'm mostly ignoring (1). There's probably still some
users of (1) out there but processing coredumps in this way can be
considered adventurous especially in the face of set*id binaries.

The most common option should be (2) by now. It works by allowing
userspace to put a string into /proc/sys/kernel/core_pattern like:

        |/usr/lib/systemd/systemd-coredump %P %u %g %s %t %c %h

The "|" at the beginning indicates to the kernel that a pipe must be
used. The path following the pipe indicator is a path to a binary that
will be spawned as a usermode helper process. Any additional parameters
pass information about the task that is generating the coredump to the
binary that processes the coredump.

In the example core_pattern shown above systemd-coredump is spawned as a
usermode helper. There's various conceptual consequences of this
(non-exhaustive list):

- systemd-coredump is spawned with file descriptor number 0 (stdin)
  connected to the read-end of the pipe. All other file descriptors are
  closed. That specifically includes 1 (stdout) and 2 (stderr). This has
  already caused bugs because userspace assumed that this cannot happen
  (Whether or not this is a sane assumption is irrelevant.).

- systemd-coredump will be spawned as a child of system_unbound_wq. So
  it is not a child of any userspace process and specifically not a
  child of PID 1. It cannot be waited upon and is in a weird hybrid
  upcall which are difficult for userspace to control correctly.

- systemd-coredump is spawned with full kernel privileges. This
  necessitates all kinds of weird privilege dropping excercises in
  userspace to make this safe.

- A new usermode helper has to be spawned for each crashing process.

This series adds a new mode:

(3) Dumping into an abstract AF_UNIX socket.

Userspace can set /proc/sys/kernel/core_pattern to:

        @address SO_COOKIE

The "@" at the beginning indicates to the kernel that the abstract
AF_UNIX coredump socket will be used to process coredumps. The address
is given by @address and must be followed by the socket cookie of the
coredump listening socket.

The socket cookie is used to verify the socket connection. If the
coredump server restarts or crashes and someone recycles the socket
address the kernel will detect that the address has been recycled as the
socket cookie will have necessarily changed and refuse to connect.

This dynamic/cookie prefix makes it impossible to use this with socket
activation units. The way systemd-coredump works is that every
instance is an independent templated unit, spawned when there's a
connection to the private socket. If the path was fixed, we could just
reuse the same mechanism, it would fit very nicely with minimal
changes.

Note this version does not use prefix.  Now it requires users to
just pass the socket cookie via core_pattern so that the kernel
can verify the peer.

Exactly - this means the pattern cannot be static in a sysctl.d early
on boot anymore, and has to be set dynamically by <something>. This is
a severe degradation over the status quo.

quoted

But because you need a "server" to be permanently running, this means
socket-based activation can no longer work, and systemd-coredump must
switch to a persistently-running mode.

The only thing for systemd to do is assign a cookie after socket creation.

As long as systemd hold the file descriptor of the socket, you don't need
a dedicated "server" running permanently, and the fd can be passed around
to a spawned/activated process.

There is no such facility, a socket is just a socket and there's no
infrastructure to randomly extract random information from one and
write it to some other random file in procfs, and I don't see why we
should add some super-special-case just for this, it sounds really
messy.
Also sockets can be and in fact are routinely restarted (eg: on
package upgrades), which would invalidate this whole scheme, and
result in a very racy setup. When packages are upgraded it's one of
the most complex workflows in modern distros, and it's very likely
that things start crashing exactly at that point, and with this
workflow it would mean we'll lose core files due to the race between
restarting the socket unit and <something> updating the pattern
accordingly.
Also we very much want to be able to spawn as many core handlers at
the same time as needed, which I don't see how can work with a cookie
that has to be unique per socket.

Sorry, but this particular approach seems completely unnecessary and
over-complicated, and doesn't seem to fit very well with how modern
userspace is set up today, and I don't see what actual problem it
would solve? If you need it for some particular use case that's
absolutely fine, but please add it later as another optional mode, so
that we don't have to degrade our use cases for it. That way everyone
gets what they want, and everyone's happy.

v5 was super nice and had everything we needed to massively improve
the status quo, with easy and straightforward changes and no real
drawbacks, so it would be really great if we could just go back to
that version, please. Thanks.

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help