Thread (161 messages) 161 messages, 8 authors, 2025-04-24

Re: [PATCH v4 17/39] unwind_user/sframe: Add support for reading .sframe headers

From: Indu Bhagat <hidden>
Date: 2025-02-06 01:11:18
Also in: linux-perf-users, linux-toolchains, lkml

On 2/4/25 4:57 PM, Josh Poimboeuf wrote:
On Thu, Jan 30, 2025 at 01:39:52PM -0800, Indu Bhagat wrote:
quoted
On 1/28/25 6:02 PM, Josh Poimboeuf wrote:
quoted
However, if we're going that route, we might want to even consider a
completely revamped data layout.  For example:

One insight is that the vast majority of (cfa, fp, ra) tuples aren't
unique.  They could be deduped by storing the unique tuples in a
standalone 'fre_data' array which is referenced by another
address-specific array.

    struct fre_data {
	s8|s16|s32 cfa, fp, ra;
	u8 info;
    };
    struct fre_data fre_data[num_fre_data];
We had the same observation at the time of SFrame V1.  And this method of
compaction (deduped tuples) was brain-stormed a bit.  Back then, the costs
were thought to be:
   - more work at build time.
   - an additional data access once the FRE is found (as there is
indirection).

So it was really compaction at the costs above.  We did steer towards
simplicity and the SFrame FRE is what it stands today.

The difference in the pros and cons now from then:
   - pros: helps mitigate unaligned accesses
   - cons: interferes slightly with the design goal of efficient addition and
removal of stack trace information per function for JIT. Think "removal" as
the set of actions necessary for addressing fragmentation in SFrame section
data in JIT usecase.
If fre_data[] is allowed to have duplicates then the deduping could be
optional.
quoted
quoted
Note FDEs aren't even needed here as the unwinder doesn't need to know
when a function begins/ends.  The only info needed by the unwinder is
just the fre_data struct.  So a simple binary search of fres[] is all
that's really needed.
Splitting out information (start_address) to an FDE (as done in V1/V2) has
the benefit that a job like relocating information is proportional to
O(NumFunctions).

In the case above, IIUC, where the proposal puts start_address in the FRE,
these costs will be (much) higher.
I'm not sure I follow, is this referring to the link-time work of
sorting things?
I meant the work of tracking the start address of each function.  This 
could be done at link-time as is done in most cases.

But also depending on the case : e.g., kernel module loader will need to 
apply these relocations in the .rela.sframe section...

If the granularity is finer than a function, more number of relocations 
will need to be applied.
quoted
In addition, not being able to identify stack trace information per function
will affect the JIT usecase.  We need to able to mark stack trace
information stale for functions in JIT environment.
Maybe, though it's hard to really say how any of these changes would
affect JIT without knowing what those interfaces are going to look like.
  
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help