Thread (10 messages) 10 messages, 2 authors, 2021-08-20

Re: [RFC] bpf: lbr: enable reading LBR from tracing bpf programs

From: Song Liu <hidden>
Date: 2021-08-18 16:46:54
Also in: lkml

Hi Peter,

Thanks for you quick response!
On Aug 18, 2021, at 2:15 AM, Peter Zijlstra [off-list ref] wrote:

On Tue, Aug 17, 2021 at 06:29:37PM -0700, Song Liu wrote:
quoted
The typical way to access LBR is via hardware perf_event. For CPUs with
FREEZE_LBRS_ON_PMI support, PMI could capture reliable LBR. On the other
hand, LBR could also be useful in non-PMI scenario. For example, in
kretprobe or bpf fexit program, LBR could provide a lot of information
on what happened with the function.

In this RFC, we try to enable LBR for BPF program. This works like:
 1. Create a hardware perf_event with PERF_SAMPLE_BRANCH_* on each CPU;
 2. Call a new bpf helper (bpf_get_branch_trace) from the BPF program;
 3. Before calling this bpf program, the kernel stops LBR on local CPU,
    make a copy of LBR, and resumes LBR;
 4. In the bpf program, the helper access the copy from #3.

Please see tools/testing/selftests/bpf/[progs|prog_tests]/get_call_trace.c
for a detailed example. Not that, this process is far from ideal, but it
allows quick prototype of this feature.

AFAICT, the biggest challenge here is that we are now sharing LBR in PMI
and out of PMI, which could trigger some interesting race conditions.
However, if we allow some level of missed/corrupted samples, this should
still be very useful.

Please share your thoughts and comments on this. Thanks in advance!
quoted
+int bpf_branch_record_read(void)
+{
+	struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
+
+	intel_pmu_lbr_disable_all();
+	intel_pmu_lbr_read();
+	memcpy(this_cpu_ptr(&bpf_lbr_entries), cpuc->lbr_entries,
+	       sizeof(struct perf_branch_entry) * x86_pmu.lbr_nr);
+	*this_cpu_ptr(&bpf_lbr_cnt) = x86_pmu.lbr_nr;
+	intel_pmu_lbr_enable_all(false);
+	return 0;
+}
Urgghhh.. I so really hate BPF specials like this.
I don't really like this design either. But it does show that LBR can be
very useful in non-PMI scenario. 
Also, the PMI race
you describe is because you're doing abysmal layer violations. If you'd
have used perf_pmu_disable() that wouldn't have been a problem.
Do you mean instead of disable/enable lbr, we disable/enable the whole 
pmu? 
I'd much rather see a generic 'fake/inject' PMI facility, something that
works across the board and isn't tied to x86/intel.
How would that work? Do we have a function to trigger PMI from software, 
and then gather the LBR data after the PMI? This does sound like a much
cleaner solution. Where can I find code examples that fake/inject PMI?

There is another limitation right now: we need to enable LBR with a 
hardware perf event (cycles, etc.). However, unless we use the event for 
something else, it wastes a hardware counter. So I was thinking to allow
software event, i.e. dummy event, to enable LBR. Does this idea sound 
sane to you?

Thanks,
Song
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help