Thread (22 messages) 22 messages, 5 authors, 2025-07-23

Re: [RFC] New codectl(2) system call for sframe registration

From: Steven Rostedt <rostedt@goodmis.org>
Date: 2025-07-22 19:11:37
Also in: bpf, lkml

Florian, You may want to read this email as there's some question about
dynamic linking.


On Tue, 22 Jul 2025 14:26:44 -0400
Mathieu Desnoyers [off-list ref] wrote:
quoted
I'm looking for a mapping between already loaded text memory to how to
unwind it that will be in an sframe format somewhere on disk.  
OK, so what you have in mind is the compressed sframe use-case.

Ideally, for the compressed sframe use-case I suspect we'd want to do
lazy on demand decompression which could decompress only the parts that
are needed for the unwind, rather than expand everything in memory.

Pointing the kernel to a file/offset on disk is rather different than
the current ELF sframe section scenario, where is it allocated,loaded
into the process' address space. I suspect we would want to cover this
with a future new code_opt enum label.
The sframe program header is of type PT_GNU_SFRAME and not PT_LOAD so
the linker will not be loading it. The code in the kernel has to do
something special with this section. It's not automatic.

So yes, I never had any expectation that the dynamic linker would even
load sframes into memory. It would simply tell the kernel where to find
it and it will load it.
quoted
Yes, but we are not registering ELF. We are registering how to unwind
something with sframes. If it's not sframes we are registering, what is
it?  
I am thinking of sframes as one of the properties of an ELF executable.
So from my perspective we are registering an ELF file with various
properties, one of which is its sframe section.
That wasn't what I was thinking.
But I think I get where you are getting at: if we define the sframe
registration for ELF as sframe_start, sframe_end, then it forgoes
approaches where sframe is provided through other means, such as
pathname and offset, which would be useful for the compressed sframe
use-case.

If system call overhead is not too much of an issue at library load,
then we could break this down into multiple system calls, e.g.
eventually:

codectl(CODE_REGISTER_SFRAME, /* provide sframe start + end */ )
codectl(CODE_REGISTER_ELF, /* provide elf-specific info such as build id */ )
IIRC, and Florian (who has been Cc'd) can correct me if I'm wrong,
dynamic file loading is quite a slow process and a few extra system
calls isn't going to show up outside the noise.

quoted
The systemcall is to let the dynamic linker know where the kernel can
find the sframes for newly loaded text.  
I am saying this is a "new" model because the current sframe section is
allocated,loaded, which means it is present in userspace memory, so it
seems rather logical to delimit this area with pointers to the start/end
of that range.
But its the kernel that maps it into memory. I was expecting that the
kernel would map it again into memory just like it does with the ELF
file. I wasn't expecting the dynamic linker to.

quoted
Actually, the sframe section shouldn't be mapped into user space
memory. The kernel will be doing that, not the linker.  
AFAIU, that's not how the sframe section works today. It's allocated,loaded.
So userspace maps the section into its address space, and the kernel takes
the page faults when it needs to load its content.
Yes, but the kernel maps it. I wasn't expecting the user space dynamic
linker to map it. I was expecting the system call to simply say "here's
where the sframe section is in this file" and the kernel would take
care of the rest.
quoted
I would say that
the system call can give a hint of where it would like it mapped, but
it should allow the kernel to decide where to map it as the user space
code doesn't care where it gets mapped.  
AFAIU currently the dynamic loader maps the section, not the kernel.
You mean the prctl()?

I haven't looked to deep into that systemcall. It may do that
currently. I'm just thinking what is the best way to do this. I guess
we should ask Florian which is best for the dynamic linker. If it
should map it in, or if the kernel should, with thinking about a
compressed format in mind as well.

quoted
In the future, if we wants to compress the sframe section, it will not
even be a loadable ELF section. But the system call can tell the
kernel: "there's a sframe compressed section at this offset/size in
this file" for this text address range and then the kernel will do the
rest.  
I would see this compressed side-file handled entirely from the kernel
(not mapped in userspace) as a new enum code_opt option.
Yes, it would likely be a new emum.

But if the dynamic linker has already mapped the sframe into memory and
giving it to the kernel, then it is even less an "elf" file. It's
simply mapping a sframe section in memory with some text in memory. The
way the dynamic linker mapped it will still do everything as normal.
quoted
  
quoted
Am I unknowingly adding some kind of redundancy here ?
 
Maybe. This systemcall was to add unwinding information for the kernel.
It looks like you are having it be much more than that. I'm not against
that, but that should only be for extensions, and currently, this is
supposed to only make sframes work.  
I agree that if we state that "elf" registration has sframe_start/end
as a mean to express sframe, then we are stuck with a model where userspace
needs to map the section in its memory. Considering that you want to
express different models where a filename and offset is provided to the
kernel instead, then it makes sense to make the registration more specific.

The downside would be that we may have to do more than one system call if we
want to register more than one "aspect", e.g. sframe vs elf build-id.

I think the overhead of a single vs a few system calls is an important
aspect to consider. If the overhead of a few more system calls at library
load does not matter too much, then we should go for the more specific
registration. I have no clue whether that overhead matters in practice though.
If the linker needs to map it, it is already doing lots of systemcalls
to accomplish that ;-)

-- Steve
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help