Thread (138 messages) 138 messages, 17 authors, 2022-09-08

Re: [RFC PATCH 00/30] Code tagging framework and applications

From: Kent Overstreet <kent.overstreet@linux.dev>
Date: 2022-09-05 20:42:43
Also in: io-uring, linux-arch, linux-bcache, linux-iommu, linux-mm, lkml, xen-devel

On Mon, Sep 05, 2022 at 11:08:21AM -0700, Suren Baghdasaryan wrote:
On Mon, Sep 5, 2022 at 8:06 AM Steven Rostedt [off-list ref] wrote:
quoted
On Sun, 4 Sep 2022 18:32:58 -0700
Suren Baghdasaryan [off-list ref] wrote:
quoted
Page allocations (overheads are compared to get_free_pages() duration):
6.8% Codetag counter manipulations (__lazy_percpu_counter_add + __alloc_tag_add)
8.8% lookup_page_ext
1237% call stack capture
139% tracepoint with attached empty BPF program
Have you tried tracepoint with custom callback?

static void my_callback(void *data, unsigned long call_site,
                        const void *ptr, struct kmem_cache *s,
                        size_t bytes_req, size_t bytes_alloc,
                        gfp_t gfp_flags)
{
        struct my_data_struct *my_data = data;

        { do whatever }
}

[..]
        register_trace_kmem_alloc(my_callback, my_data);

Now the my_callback function will be called directly every time the
kmem_alloc tracepoint is hit.

This avoids that perf and BPF overhead.
Haven't tried that yet but will do. Thanks for the reference code!
Is it really worth the effort of benchmarking tracing API overhead here?

The main cost of a tracing based approach is going to to be the data structure
for remembering outstanding allocations so that free events can be matched to
the appropriate callsite. Regardless of whether it's done with BFP or by
attaching to the tracepoints directly, that's going to be the main overhead.
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help