Thread (4 messages) 4 messages, 3 authors, 25d ago

Re: [REGRESSION] rseq: refactoring in v6.19 broke everyone on arm64 and tcmalloc everywhere

From: Thomas Gleixner <tglx@kernel.org>
Date: 2026-04-27 11:03:36
Also in: linux-arm-kernel, linux-man, lkml

Possibly related (same subject, not in this thread)

On Mon, Apr 27 2026 at 09:40, Florian Weimer wrote:
* Thomas Gleixner:
quoted
The real question is how to differentiate between the legacy and the
optimized mode. I have two working variants to achieve that:

   1) The fully safe option requires a new flag for RSEQ
      registration. It obviously requires a glibc update. (Suggested by
      PeterZ)
Without glibc changes, RSEQ would keep working, but with the old,
problematic performance, right?
Correct.
If we don't have a notification in the auxiliary vector, we'd have to do
two system calls at process start, which isn't ideal, but is probably
not a significant issue, either.

I haven't verified this, but it looks like introducing the flag breaks
CRIU?  In dump_thread_rseq, we have this:

        if (rseqc.flags != 0) {
                pr_err("something wrong with ptrace(PTRACE_GET_RSEQ_CONFIGURATION, %d) flags = 0x%x\n", tid,
                       rseqc.flags);
                return -1;
        }
Yeah. That'd need to be fixed or work around.
I suppose a workaround could make this behavior flag a prctl flag.  CRIU
wouldn't dump and restore that until taught about it.  If the new
behavior is switched on explicitly by the flag, it would be
backwards-compatible, except that restoring with unpatched CRIU would
lead to a performance loss.
It's worse. The flag will also enable extended RSEQ features beyond
mmcid and requires that the registered rseq size is >= offsetof(struct
rseq, end)'
quoted
   2) Determine the requirements of the registering task via the size of
      the registered RSEQ area.

      The original implementation, which TCMalloc depends on, registers
      a 32 byte region (ORIG_RSEG_SIZE). This region has 32 byte
      alignment requirement.

      The extension safe newer variant exposes the kernel RSEQ feature
      size via getauxval(AT_RSEQ_FEATURE_SIZE) and the alignment
      requirement via getauxval(AT_RSEQ_ALIGN). The alignment
      requirement is that the registered rseq region is aligned to the
      next power of two of the feature size. The kernel currently has a
      feature size of 33 bytes, which means the alignment requirement is
      64 bytes.
There are still glibc builds in use that do not use AT_RSEQ_ALIGN, and
instead unconditionally reserve a size of 32.  In some builds, the RSEQ
area is not aligned to a multiple of 64, which makes glibc
indistinguishable from tcmalloc.
That's how it is. So with a size of 32 this will fallback to legacy mode
and not unlock the extended features independent of the alignment. The
alignment requirements are:

          Size 32:     32 bytes
          Size >32:    64 bytes
You could look at the location of the thread pointer relative to the
RSEQ area at registration to tell them apart, but that is perhaps too
nasty.
*Blink*
Switching to the new extensible RSEQ allocation code in older glibc
builds is not entirely trivial, and I would prefer not doing that.
Registering with a new flag is comparatively simple, and we could
backport it, except that it might not be compatible with CRIU.
Neither with CRIU nor with the requirement to support additional
features which require the registered rseq memory size to be at least as
large as the kernel requires. That's why we have AT_RSEQ_FEATURE_SIZE.

Otherwise we'd end up with runtime conditionals for every single
feature, which just adds more gunk into the hotpaths and ends up in a
ever growing compatibility nightmare.

So if a process runs on a newer kernel with let's say 40 bytes rseq
size, then it can't be safely migrated with CRIU to a older kernel with
32 bytes rseq size as you don't know whether the process uses some of
the extended features in the newer kernel already. But that's not any
different from extended syscall features etc.

So with the size based detection we end up with the following:

  Size 32:             legacy mode no matter whether that's TCMalloc or
                       glibc. Does not support extended features
  
  Size >= kernel size: optimized mode with support for extended features

Thanks,

        tglx

Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help