--- v1
+++ v11
@@ -11,184 +11,330 @@
pages, making it an ideal structure for sharing between kernel and
hypervisor.
-This series introduces a method to create events and to generate them
-from the hypervisor (hyp_enter/hyp_exit given as an example) as well as
-a Tracefs user-space interface to read them.
-
-A presentation was given on this matter during the tracing summit in
-2022. [1]
+This series first introduces a new generic way of creating remote events and
+remote buffers. Then it adds support to the pKVM hypervisor.
1. ring-buffer
--------------
To setup the per-cpu ring-buffers, a new interface is created:
- ring_buffer_writer: Describes what the kernel needs to know about the
- writer, that is, the set of pages forming the
+ ring_buffer_remote: Describes what the kernel needs to know about the
+ remote writer, that is, the set of pages forming the
ring-buffer and a callback for the reader/head
swapping (enables consuming read)
- ring_buffer_reader(): Creates a read-only ring-buffer from a
- ring_buffer_writer.
-
-To keep the internals of `struct ring_buffer` in sync with the writer,
+ ring_buffer_remote(): Creates a read-only ring-buffer from a
+ ring_buffer_remote.
+
+To keep the internals of `struct ring_buffer` in sync with the remote,
the meta-page is used. It was originally introduced to enable user-space
mapping of the ring-buffer [1]. In this case, the kernel is not the
producer anymore but the reader. The function to read that meta-page is:
- ring_buffer_poll_writer():
- Update `struct ring_buffer` based on the writer
+ ring_buffer_poll_remote():
+ Update `struct ring_buffer` based on the remote
meta-page. Wake-up readers if necessary.
The kernel has to poll the meta-page to be notified of newly written
events.
-2. Tracefs interface
---------------------
-
-The interface is a hypervisor/ folder at the root of the tracefs mount
-point. This folder is like an instance and you'll find there a subset
-of the regular Tracefs user-space interface:
-
- hypervisor/
- buffer_size_kb
- trace_clock
- trace_pipe
- trace_pipe_raw
- trace
- per_cpu/
- cpuX/
- trace
- trace_pipe
- trace_pipe_raw
- events/
- hypervisor/
- hyp_enter/
- enable
- id
-
-Behind the scenes, kvm/hyp_trace.c must rebuild the tracing hierarchy
-without relying on kernel/trace/trace.c. This is due to fundamental
-differences:
-
- * Hypervisor tracing doesn't support trace_array's system-specific
+2. Tracefs
+----------
+
+This series introduce a new trace_remote that does the link between
+tracefs and the remote ring-buffer.
+
+The interface is found in the remotes/ directory at the root of the
+tracefs mount point. Each remote is like an instance and you'll find
+there a subset of the regular Tracefs user-space interface:
+
+ remotes/test
+ |-- buffer_size_kb
+ |-- events
+ | |-- enable
+ | |-- header_event
+ | |-- header_page
+ | `-- test
+ | `-- selftest
+ | |-- enable
+ | |-- format
+ | `-- id
+ |-- per_cpu
+ | `-- cpu0
+ | |-- trace
+ | `-- trace_pipe
+ |-- trace
+ |-- trace_pipe
+ |-- tracing_on
+
+Behind the scenes, kernel/trace/trace_remote.c creates this tracefs
+hierarchy without relying on kernel/trace/trace.c. This is due to
+fundamental differences:
+
+ * Remote tracing doesn't support trace_array's system-specific
features (snapshots, tracers, etc.).
- * Logged event formats differ (e.g., no PID in hypervisor
- events).
-
- * Buffer operations require specific hypervisor interactions.
-
-3. Events
+ * Logged event formats differ (e.g., no PID for remote events).
+
+ * Buffer operations require specific remote interactions.
+
+3. Simple Ring-Buffer
+---------------------
+
+As the current ring-buffer.c implementation has too many dependencies to
+be used directly by the pKVM hypervisor. A new simple implementation is
+created and can be found in kernel/trace/simple-ring-buffer.c.
+
+This implementation is write-only and is used by both the pKVM
+hypervisor and a trace_remote test module.
+
+4. Events
---------
-In the hypervisor, "hyp events" can be generated with trace_<event_name>
-in a similar fashion to what the kernel does. They're also created with
-similar macros than the kernel (see kvm_hypevents.h)
-
-HYP_EVENT("foboar",
- HE_PROTO(void),
- HE_STRUCT(),
- HE_ASSIGN(),
- HE_PRINTK(" ")
-)
-
-Despite the apparent similarities with TRACE_EVENT(), those macros
-internally differs: they must be used in parallel between the hypervisor
-(for the writing part) and the kernel (for the reading part) which makes
-it difficult to share anything with their kernel counterpart.
-
-Also, events directory isn't using eventfs.
-
-4. Few limitations:
--------------------
-
-Non consuming reading of the buffer isn't supported (i.e. cat trace ->
+A new REMOTE_EVENT() macro is added to simplify the creation of events
+on the kernel side. As remote tracing buffer are read only, only the
+event structure and a way of printing must be declared. The prototype of
+the macro is very similar to the well-known TRACE_EVENT()
+
+ REMOTE_EVENT(my_event, id,
+ RE_STRUCT(
+ re_field(u64, foobar)
+ ),
+ RE_PRINTK("foobar=%lld", __entry->foobar)
+ )
+ )
+
+5. pKVM
+-------
+
+The pKVM support simply creates a "hypervisor" trace_remote on the
+kernel side and inherits from simple-ring-buffer.c on the hypervisor
+side.
+
+A new event macro is created HYP_EVENT() that is under the hood re-using
+REMOTE_EVENT() (defined in the previous paragaph) as well as generate
+hypervisor specific struct and trace_<event>() functions.
+
+5. Limitations:
+---------------
+
+Non-consuming reading of the buffer isn't supported (i.e. cat trace ->
-EPERM) due to current the lack of support in the ring-buffer meta-page.
[1] https://tracingsummit.org/ts/2022/hypervisortracing/
[2] https://lore.kernel.org/all/20240510140435.3550353-1-vdonnefort@google.com/
-Changes since RFC: https://lore.kernel.org/all/20240805173234.3542917-1-vdonnefort@google.com/
+changes since v10
+
+ - Move kerneldoc to .c files (Steven)
+ - Return EBUSY on buffer_size_kb write if buffer is loaded (Steven)
+ - Remove rb_iter/rb_iters union in trace_remote_iterator (Steven)
+ - Rename a refactor trace file seq_operations (Steven)
+ - Make trace_get_cpu() accessible to trace_remote.c (Steven)
+ - Remove unnecessary cpus_read_unlock() (Steven)
+ - !preempt on remote_test driver buffer writing (Steven)
+ - Do not fail selftest if cpu/online is unavailable (Steven)
+ - Add rational for trace_remote into documentation (Steven)
+
+changes since v9
+
+ - Add vCPU PID to hyp_enter/hyp_exit (Marc)
+ - Remove useless X1 setting for tracing HVCs (Marc)
+ - Fix REMOTE_PRINTK_COUNT_ARGS()
+ - Rebase on 6.19-rc7
+
+Changes since v8
+
+ - Do not enable tracing if unstable cnvct (Marc)
+ - Add support for nVHE (Marc)
+ - Add PKVM_DISABLE_STAGE2_ON_PANIC (Marc)
+ - NVHE_EL2_TRACING depends on NVHE_EL2_DEBUG (Marc)
+ - Add a reason for hyp_enter/hyp_exit events (Marc)
+ - Remove PKVM_SELFTESTS in favour of NVHE_EL2_DEBUG
+ - Add wrapper for arm_smccc_1_2, now used in nvhe/ffa.c
+
+Changes since v7
+
+ - Add missing EXPORT_SYMBOL_GPL for remote_test.ko
+ - Rebase on 6.18-rc4
+
+Changes since v6
+
+ - Add requires field to the selftest (Masami)
+ - Use guard() for ring_buffer_poll_remote (Steven)
+ - Rename ring_buffer_remote() to ring_buffer_alloc_remote() (Steven)
+ - kerneldoc for trace_buffer_remote and simple_ring_buffer (Steven)
+ - Validate trace_buffer_desc size in trace_remote_alloc_buffer
+ (Steven)
+ - Add non-consuming ring-buffer read (Steven)
+ - Add spinning failsafe in simple_ring_buffer (Steven)
+ - Range check for hyp_trace_desc::bpages_backing_* in hyp_trace_desc_validate()
+ - unsigned int cpu in hyp_trace_desc_validate()
+ - Fix event/format file
+ - Add tests with an offline CPU
+ - Add tests for non-consuming read
+ - Add documentation
+ - Rebase on 6.17
+
+Changes since v5 (https://lore.kernel.org/all/20250516134031.661124-1-vdonnefort@google.com/)
+
+ - Add tishift lib to the hyp (Aneesh)
+ - Rebase on 6.17-rc2
+
+Changes since v4 (https://lore.kernel.org/all/20250506164820.515876-1-vdonnefort@google.com/)
+
+ - Extend meta-page with pages_touched and pages_lost
+ - Create ring_buffer_types.h
+ - Fix simple_ring_buffer build for 32-bits arch and x86
+ - Try unload buffer on reset (+ test)
+ - Minor renaming and comments
+
+Changes since v3 (https://lore.kernel.org/all/20250224121353.98697-1-vdonnefort@google.com/)
+
+ - Move tracefs support from kvm/hyp_trace.c into a generic trace_remote.c.
+ - Move ring-buffer implementation from nvhe/trace.c into a generic
+ simple-ring-buffer.c
+ - Rebase on 6.15-rc4.
+
+Changes since v2 (https://lore.kernel.org/all/20250108114536.627715-1-vdonnefort@google.com/)
+
+ - Fix ring-buffer remote reset
+ - Fix fast-forward in rb_page_desc()
+ - Refactor nvhe/trace.c
+ - struct hyp_buffer_page more compact
+ - Add a struct_len to trace_page_desc
+ - Extend reset testing
+ - Rebase on 6.14-rc3
+
+Changes since v1 (https://lore.kernel.org/all/20240911093029.3279154-1-vdonnefort@google.com/)
+
+ - Add 128-bits mult fallback in the unlikely event of an overflow. (John)
+ - Fix ELF section sort.
+ - __always_inline trace_* event macros.
+ - Fix events/<event>/enable permissions.
+ - Rename ring-buffer "writer" to "remote".
+ - Rename CONFIG_PROTECTED_NVHE_TESTING to PKVM_SELFTEST to align with
+ Quentin's upcoming selftest
+ - Rebase on 6.13-rc3.
+
+Changes since RFC (https://lore.kernel.org/all/20240805173234.3542917-1-vdonnefort@google.com/)
- hypervisor trace clock:
- - mult/shift computed in hyp_trace.c.
- - Update clock when it deviates from kernel boot clock.
+ - mult/shift computed in hyp_trace.c. (John)
+ - Update clock when it deviates from kernel boot clock. (John)
- Add trace_clock file.
- Separate patch for better readability.
-
- Add a proper reset interface which does not need to teardown the
- tracing buffers.
-
- - Return -EPERM on trace access.
-
+ tracing buffers. (Steven)
+ - Return -EPERM on trace access. (Steven)
- Add per-cpu trace file.
-
- Automatically teardown and free the tracing buffer when it is empty,
without readers and not currently tracing.
-
- Show in buffer_size_kb if the buffer is loaded in the hypervisor or
not.
-
- Extend tests to cover reset and unload.
-
- - CC timekeeping folks on relevant patches
-
-Vincent Donnefort (13):
- ring-buffer: Check for empty ring-buffer with rb_num_of_entries()
- ring-buffer: Introducing ring-buffer writer
- ring-buffer: Expose buffer_data_page material
- timekeeping: Add the boot clock to system time snapshot
- KVM: arm64: Support unaligned fixmap in the nVHE hyp
- KVM: arm64: Add clock support in the nVHE hyp
- KVM: arm64: Add tracing support for the pKVM hyp
- KVM: arm64: Add hyp tracing to tracefs
- KVM: arm64: Add clock for hyp tracefs
- KVM: arm64: Add raw interface for hyp tracefs
- KVM: arm64: Add trace interface for hyp tracefs
- KVM: arm64: Add support for hyp events
- KVM: arm64: Add kselftest for tracefs hyp tracefs
-
- arch/arm64/include/asm/kvm_asm.h | 8 +
- arch/arm64/include/asm/kvm_define_hypevents.h | 60 ++
- arch/arm64/include/asm/kvm_hyp.h | 1 -
- arch/arm64/include/asm/kvm_hypevents.h | 41 +
- arch/arm64/include/asm/kvm_hypevents_defs.h | 41 +
- arch/arm64/include/asm/kvm_hyptrace.h | 37 +
- arch/arm64/kernel/image-vars.h | 4 +
- arch/arm64/kernel/vmlinux.lds.S | 18 +
- arch/arm64/kvm/Kconfig | 9 +
- arch/arm64/kvm/Makefile | 2 +
- arch/arm64/kvm/arm.c | 6 +
- arch/arm64/kvm/hyp/hyp-constants.c | 4 +
- arch/arm64/kvm/hyp/include/nvhe/arm-smccc.h | 13 +
- arch/arm64/kvm/hyp/include/nvhe/clock.h | 16 +
- .../kvm/hyp/include/nvhe/define_events.h | 21 +
- arch/arm64/kvm/hyp/include/nvhe/trace.h | 60 ++
- arch/arm64/kvm/hyp/nvhe/Makefile | 1 +
- arch/arm64/kvm/hyp/nvhe/clock.c | 49 +
- arch/arm64/kvm/hyp/nvhe/events.c | 35 +
- arch/arm64/kvm/hyp/nvhe/ffa.c | 2 +-
- arch/arm64/kvm/hyp/nvhe/hyp-main.c | 85 ++
- arch/arm64/kvm/hyp/nvhe/hyp.lds.S | 4 +
- arch/arm64/kvm/hyp/nvhe/mm.c | 2 +-
- arch/arm64/kvm/hyp/nvhe/psci-relay.c | 14 +-
- arch/arm64/kvm/hyp/nvhe/switch.c | 5 +-
- arch/arm64/kvm/hyp/nvhe/trace.c | 660 ++++++++++++
- arch/arm64/kvm/hyp_events.c | 165 +++
- arch/arm64/kvm/hyp_trace.c | 981 ++++++++++++++++++
- arch/arm64/kvm/hyp_trace.h | 15 +
- include/linux/ring_buffer.h | 108 +-
- include/linux/timekeeping.h | 2 +
- kernel/time/timekeeping.c | 4 +
- kernel/trace/ring_buffer.c | 294 ++++--
- tools/testing/selftests/hyp-trace/Makefile | 6 +
- tools/testing/selftests/hyp-trace/config | 4 +
- .../selftests/hyp-trace/hyp-trace-test | 254 +++++
- 36 files changed, 2932 insertions(+), 99 deletions(-)
+ - CC timekeeping folks on relevant patches (Marc)
+
+Vincent Donnefort (30):
+ ring-buffer: Add page statistics to the meta-page
+ ring-buffer: Store bpage pointers into subbuf_ids
+ ring-buffer: Introduce ring-buffer remotes
+ ring-buffer: Add non-consuming read for ring-buffer remotes
+ tracing: Introduce trace remotes
+ tracing: Add reset to trace remotes
+ tracing: Add non-consuming read to trace remotes
+ tracing: Add init callback to trace remotes
+ tracing: Add events to trace remotes
+ tracing: Add events/ root files to trace remotes
+ tracing: Add helpers to create trace remote events
+ ring-buffer: Export buffer_data_page and macros
+ tracing: Introduce simple_ring_buffer
+ tracing: Add a trace remote module for testing
+ tracing: selftests: Add trace remote tests
+ Documentation: tracing: Add tracing remotes
+ tracing: load/unload page callbacks for simple_ring_buffer
+ tracing: Check for undefined symbols in simple_ring_buffer
+ KVM: arm64: Add PKVM_DISABLE_STAGE2_ON_PANIC
+ KVM: arm64: Add clock support to nVHE/pKVM hyp
+ KVM: arm64: Initialise hyp_nr_cpus for nVHE hyp
+ KVM: arm64: Support unaligned fixmap in the pKVM hyp
+ KVM: arm64: Add tracing capability for the nVHE/pKVM hyp
+ KVM: arm64: Add trace remote for the nVHE/pKVM hyp
+ KVM: arm64: Sync boot clock with the nVHE/pKVM hyp
+ KVM: arm64: Add trace reset to the nVHE/pKVM hyp
+ KVM: arm64: Add event support to the nVHE/pKVM hyp and trace remote
+ KVM: arm64: Add hyp_enter/hyp_exit events to nVHE/pKVM hyp
+ KVM: arm64: Add selftest event support to nVHE/pKVM hyp
+ tracing: selftests: Add hypervisor trace remote tests
+
+ Documentation/trace/index.rst | 11 +
+ Documentation/trace/remotes.rst | 66 +
+ arch/arm64/include/asm/kvm_asm.h | 8 +
+ arch/arm64/include/asm/kvm_define_hypevents.h | 16 +
+ arch/arm64/include/asm/kvm_host.h | 3 +
+ arch/arm64/include/asm/kvm_hyp.h | 4 +-
+ arch/arm64/include/asm/kvm_hypevents.h | 60 +
+ arch/arm64/include/asm/kvm_hyptrace.h | 26 +
+ arch/arm64/kernel/image-vars.h | 4 +
+ arch/arm64/kernel/vmlinux.lds.S | 18 +
+ arch/arm64/kvm/Kconfig | 64 +-
+ arch/arm64/kvm/Makefile | 2 +
+ arch/arm64/kvm/arm.c | 12 +-
+ arch/arm64/kvm/handle_exit.c | 2 +-
+ arch/arm64/kvm/hyp/include/nvhe/arm-smccc.h | 23 +
+ arch/arm64/kvm/hyp/include/nvhe/clock.h | 16 +
+ .../kvm/hyp/include/nvhe/define_events.h | 14 +
+ arch/arm64/kvm/hyp/include/nvhe/mem_protect.h | 2 -
+ arch/arm64/kvm/hyp/include/nvhe/trace.h | 70 +
+ arch/arm64/kvm/hyp/nvhe/Makefile | 6 +-
+ arch/arm64/kvm/hyp/nvhe/clock.c | 65 +
+ arch/arm64/kvm/hyp/nvhe/events.c | 25 +
+ arch/arm64/kvm/hyp/nvhe/ffa.c | 28 +-
+ arch/arm64/kvm/hyp/nvhe/host.S | 2 +-
+ arch/arm64/kvm/hyp/nvhe/hyp-main.c | 87 +-
+ arch/arm64/kvm/hyp/nvhe/hyp.lds.S | 6 +
+ arch/arm64/kvm/hyp/nvhe/mm.c | 4 +-
+ arch/arm64/kvm/hyp/nvhe/psci-relay.c | 7 +-
+ arch/arm64/kvm/hyp/nvhe/setup.c | 4 +-
+ arch/arm64/kvm/hyp/nvhe/stacktrace.c | 6 +-
+ arch/arm64/kvm/hyp/nvhe/switch.c | 5 +-
+ arch/arm64/kvm/hyp/nvhe/trace.c | 306 ++++
+ arch/arm64/kvm/hyp_trace.c | 443 ++++++
+ arch/arm64/kvm/hyp_trace.h | 11 +
+ arch/arm64/kvm/stacktrace.c | 8 +-
+ fs/tracefs/inode.c | 1 +
+ include/linux/ring_buffer.h | 58 +
+ include/linux/ring_buffer_types.h | 41 +
+ include/linux/simple_ring_buffer.h | 65 +
+ include/linux/trace_remote.h | 49 +
+ include/linux/trace_remote_event.h | 33 +
+ include/trace/define_remote_events.h | 73 +
+ include/uapi/linux/trace_mmap.h | 8 +-
+ kernel/trace/Kconfig | 14 +
+ kernel/trace/Makefile | 20 +
+ kernel/trace/remote_test.c | 261 ++++
+ kernel/trace/remote_test_events.h | 10 +
+ kernel/trace/ring_buffer.c | 356 ++++-
+ kernel/trace/simple_ring_buffer.c | 519 +++++++
+ kernel/trace/trace.c | 4 +-
+ kernel/trace/trace.h | 7 +
+ kernel/trace/trace_remote.c | 1371 +++++++++++++++++
+ .../ftrace/test.d/remotes/buffer_size.tc | 25 +
+ .../selftests/ftrace/test.d/remotes/functions | 88 ++
+ .../test.d/remotes/hypervisor/buffer_size.tc | 11 +
+ .../ftrace/test.d/remotes/hypervisor/reset.tc | 11 +
+ .../ftrace/test.d/remotes/hypervisor/trace.tc | 11 +
+ .../test.d/remotes/hypervisor/trace_pipe.tc | 11 +
+ .../test.d/remotes/hypervisor/unloading.tc | 11 +
+ .../selftests/ftrace/test.d/remotes/reset.tc | 90 ++
+ .../selftests/ftrace/test.d/remotes/trace.tc | 127 ++
+ .../ftrace/test.d/remotes/trace_pipe.tc | 127 ++
+ .../ftrace/test.d/remotes/unloading.tc | 41 +
+ 63 files changed, 4757 insertions(+), 120 deletions(-)
+ create mode 100644 Documentation/trace/remotes.rst
create mode 100644 arch/arm64/include/asm/kvm_define_hypevents.h
create mode 100644 arch/arm64/include/asm/kvm_hypevents.h
- create mode 100644 arch/arm64/include/asm/kvm_hypevents_defs.h
create mode 100644 arch/arm64/include/asm/kvm_hyptrace.h
create mode 100644 arch/arm64/kvm/hyp/include/nvhe/arm-smccc.h
create mode 100644 arch/arm64/kvm/hyp/include/nvhe/clock.h
@@ -197,15 +343,31 @@
create mode 100644 arch/arm64/kvm/hyp/nvhe/clock.c
create mode 100644 arch/arm64/kvm/hyp/nvhe/events.c
create mode 100644 arch/arm64/kvm/hyp/nvhe/trace.c
- create mode 100644 arch/arm64/kvm/hyp_events.c
create mode 100644 arch/arm64/kvm/hyp_trace.c
create mode 100644 arch/arm64/kvm/hyp_trace.h
- create mode 100644 tools/testing/selftests/hyp-trace/Makefile
- create mode 100644 tools/testing/selftests/hyp-trace/config
- create mode 100755 tools/testing/selftests/hyp-trace/hyp-trace-test
-
-
-base-commit: 8d8d276ba2fb5f9ac4984f5c10ae60858090babc
+ create mode 100644 include/linux/ring_buffer_types.h
+ create mode 100644 include/linux/simple_ring_buffer.h
+ create mode 100644 include/linux/trace_remote.h
+ create mode 100644 include/linux/trace_remote_event.h
+ create mode 100644 include/trace/define_remote_events.h
+ create mode 100644 kernel/trace/remote_test.c
+ create mode 100644 kernel/trace/remote_test_events.h
+ create mode 100644 kernel/trace/simple_ring_buffer.c
+ create mode 100644 kernel/trace/trace_remote.c
+ create mode 100644 tools/testing/selftests/ftrace/test.d/remotes/buffer_size.tc
+ create mode 100644 tools/testing/selftests/ftrace/test.d/remotes/functions
+ create mode 100644 tools/testing/selftests/ftrace/test.d/remotes/hypervisor/buffer_size.tc
+ create mode 100644 tools/testing/selftests/ftrace/test.d/remotes/hypervisor/reset.tc
+ create mode 100644 tools/testing/selftests/ftrace/test.d/remotes/hypervisor/trace.tc
+ create mode 100644 tools/testing/selftests/ftrace/test.d/remotes/hypervisor/trace_pipe.tc
+ create mode 100644 tools/testing/selftests/ftrace/test.d/remotes/hypervisor/unloading.tc
+ create mode 100644 tools/testing/selftests/ftrace/test.d/remotes/reset.tc
+ create mode 100644 tools/testing/selftests/ftrace/test.d/remotes/trace.tc
+ create mode 100644 tools/testing/selftests/ftrace/test.d/remotes/trace_pipe.tc
+ create mode 100644 tools/testing/selftests/ftrace/test.d/remotes/unloading.tc
+
+
+base-commit: 8dfce8991b95d8625d0a1d2896e42f93b9d7f68d
--
-2.46.0.598.g6f2099f65c-goog
-
+2.53.0.rc1.225.gd81095ad13-goog
+