Re: [RFC] New codectl(2) system call for sframe registration
From: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Date: 2025-07-22 18:26:51
Also in:
bpf, lkml
On 2025-07-22 12:25, Steven Rostedt wrote:
On Tue, 22 Jul 2025 09:51:22 -0400 Mathieu Desnoyers [off-list ref] wrote:quoted
quoted
Here's a hypothetical, what if for some reason (say having the sframe sections outside of the elf file) that the linker shares that?So your hypothetical scenario is having sframe provided as a separate file. This sframe file (or part of it) would still describe how to unwind a given elf file text range. So I would argue that this wouldNo. It should describe how to get access to an sframe section for some text that has already been loaded in memory. I'm looking for a mapping between already loaded text memory to how to unwind it that will be in an sframe format somewhere on disk.
OK, so what you have in mind is the compressed sframe use-case. Ideally, for the compressed sframe use-case I suspect we'd want to do lazy on demand decompression which could decompress only the parts that are needed for the unwind, rather than expand everything in memory. Pointing the kernel to a file/offset on disk is rather different than the current ELF sframe section scenario, where is it allocated,loaded into the process' address space. I suspect we would want to cover this with a future new code_opt enum label.
quoted
still fit into the model of CODE_REGISTER_ELF, it's just that the address range from sframe_start to sframe_end would be mapped from a different file. This is entirely up to the dynamic loader and should not impact the kernel ABI. AFAIK the a.out binary support was deprecated in Linux kernel v5.1. So being elf specific is not an issue.Yes, but we are not registering ELF. We are registering how to unwind something with sframes. If it's not sframes we are registering, what is it?
I am thinking of sframes as one of the properties of an ELF executable. So from my perspective we are registering an ELF file with various properties, one of which is its sframe section. But I think I get where you are getting at: if we define the sframe registration for ELF as sframe_start, sframe_end, then it forgoes approaches where sframe is provided through other means, such as pathname and offset, which would be useful for the compressed sframe use-case. If system call overhead is not too much of an issue at library load, then we could break this down into multiple system calls, e.g. eventually: codectl(CODE_REGISTER_SFRAME, /* provide sframe start + end */ ) codectl(CODE_REGISTER_ELF, /* provide elf-specific info such as build id */ )
quoted
And if for some reason we end up inventing a new model to hand over the sframe information in the future, for instance if we choose not to map the sframe information in userspace and hand over a sframe-file pathname and offset instead, we'll just extend the code_opt enum with a new label.This is not a new model. We could likely do it today without much effort. We are handing over sframe data regardless if it's in an ELF file or not. The systemcall is to let the dynamic linker know where the kernel can find the sframes for newly loaded text.
I am saying this is a "new" model because the current sframe section is allocated,loaded, which means it is present in userspace memory, so it seems rather logical to delimit this area with pointers to the start/end of that range.
quoted
quoted
For instance, if the sframe sections are downloaded separately as a separate package for a given executable (to make it not mandatory for an install), the linker could be smart enough to see that they exist in some special location and then pass that to the kernel. In other words, this is option is specific for sframe and not ELF. I rather call it by that.As I explained above, if the dynamic loader populates the sframe section in userspace memory, this fits within the CODE_REGISTER_ELF ABI. If weBut this isn't about ELF! It's about sframes! Why not name it that?
I understand your position in wanting other "types" of sframe registration in the future that would cover compressed sframe files. Because of this, it makes sense that the registration becomes specific to sframe, because we would not want to tie all "elf" registrations to a specific sframe ABI (mapped in userspace memory, within a given address range vs pathname and offset).
quoted
eventually choose not to map the sframe section into userspace memory (even though this is not an envisioned use-case at the moment), we can just extend enum code_opt with a new label.Why call this at all if you don't plan on mapping sframes?
If we split this into separate registrations (sframe vs elf), then it would be fine: registering an elf binary (in the future) could be done to explicitly register pathname, build-id and debug link. And this is independent of sframe. This could come as a future new code_opt label, no need to do it now.
quoted
quoted
quoted
If there are other file types in the future that happen to contain an sframe section (but are not ELF), then we can simply add a new label to enum code_opt.quoted
quoted
sys_codectl(2) ================= * arg0: unsigned int @option: /* Additional labels can be added to enum code_opt, for extensibility. */ enum code_opt { CODE_REGISTER_ELF,Perhaps the above should be: CODE_REGISTER_SFRAME, as currently SFrame is read only via files.As I pointed out above, on GNU/Linux, sframe is always an allocated,loaded ELF section. AFAIU, your comment implies that we'd want to support other scenarios where the sframe is in files outside of elf binary sframe sections. Can you expand on the use-case you have for this, or is it just for future-proofing ?Heh, I just did above (before reading this). But yeah, it could be. As I mentioned above, this is not about ELF files. Sframes just happen to be in an ELF file. CODE_REGISTER_ELF sounds like this is for doing special actions to an ELF file, when in reality it is doing special actions to tell the kernel this is an sframe table. It just happens that sframes are in ELF. Let's call it for what it is used for.I see sframe as one "aspect" of an ELF file. Sure, we could do one system call for every aspect of an ELF file that we want to register, but that would require many round trips from userspace to the kernel every time a library is loaded. In my opinion it makes sense to combine all aspects of an elf file that we want the kernel to know about into one registration system call. In that sense, we're not registering just sframe, but the various aspects of an ELF file, which include sframe.So you are making this a generic ELF function? What other functions do you plan to do with this system call?
All those I have in mind are part of this RFC.
quoted
By the way, the sframe section is optional as well. If we allow sframe_start and sframe_end to be NULL, this would let libc register an sframe-less ELF file with its pathname, build-id, and debug info to the kernel. This would be immediately useful on its own for distributions that have frame pointers enabled even without sframe section.The above is called mission creep. Looks to me that you are using this as a way to have LTTng get easier access to build ids and such. We can add *that* later if needed, as a separate option. This has nothing to do with the current requirements.
I agree on the mission creep argument. I disagree on the stated intent though. For LTTng, I'm happy to grab this information from userspace. I already have it and I don't need it from the kernel. I figured it would be most useful for perf and ftrace if you guys can directly get that information without relying on a userspace tracer. So considering the fact that you'll want to introduce new sframe registration methods in the future, then indeed it makes sense to make the registration sframe-specific.
quoted
quoted
quoted
quoted
And call it "struct code_sframe_info"quoted
__u64 text_start; __u64 text_end;quoted
__u64 sframe_start; __u64 sframe_end;What is the above "sframe" for?Still wondering what the above is for.Well we have an sframe section which is mapped into userspace memory from sframe_start to sframe_end, which contains the unwind information that covers the code from text_start to text_end.Actually, the sframe section shouldn't be mapped into user space memory. The kernel will be doing that, not the linker.
AFAIU, that's not how the sframe section works today. It's allocated,loaded. So userspace maps the section into its address space, and the kernel takes the page faults when it needs to load its content.
I would say that the system call can give a hint of where it would like it mapped, but it should allow the kernel to decide where to map it as the user space code doesn't care where it gets mapped.
AFAIU currently the dynamic loader maps the section, not the kernel.
In the future, if we wants to compress the sframe section, it will not even be a loadable ELF section. But the system call can tell the kernel: "there's a sframe compressed section at this offset/size in this file" for this text address range and then the kernel will do the rest.
I would see this compressed side-file handled entirely from the kernel (not mapped in userspace) as a new enum code_opt option.
quoted
Am I unknowingly adding some kind of redundancy here ?Maybe. This systemcall was to add unwinding information for the kernel. It looks like you are having it be much more than that. I'm not against that, but that should only be for extensions, and currently, this is supposed to only make sframes work.
I agree that if we state that "elf" registration has sframe_start/end as a mean to express sframe, then we are stuck with a model where userspace needs to map the section in its memory. Considering that you want to express different models where a filename and offset is provided to the kernel instead, then it makes sense to make the registration more specific. The downside would be that we may have to do more than one system call if we want to register more than one "aspect", e.g. sframe vs elf build-id. I think the overhead of a single vs a few system calls is an important aspect to consider. If the overhead of a few more system calls at library load does not matter too much, then we should go for the more specific registration. I have no clue whether that overhead matters in practice though. Thoughts ? Thanks, Mathieu -- Mathieu Desnoyers EfficiOS Inc. https://www.efficios.com