--- v12
+++ vrfc
@@ -11,337 +11,158 @@
pages, making it an ideal structure for sharing between kernel and
hypervisor.
-This series first introduces a new generic way of creating remote events and
-remote buffers. Then it adds support to the pKVM hypervisor.
+This series introduces a method to create events and to generate them
+from the hypervisor (hyp_enter/hyp_exit given as an example) as well as
+a Tracefs user-space interface to read them.
+
+A presentation was given on this matter during the tracing summit in
+2022. [1]
1. ring-buffer
--------------
To setup the per-cpu ring-buffers, a new interface is created:
- ring_buffer_remote: Describes what the kernel needs to know about the
- remote writer, that is, the set of pages forming the
+ ring_buffer_writer: Describes what the kernel needs to know about the
+ writer, that is, the set of pages forming the
ring-buffer and a callback for the reader/head
swapping (enables consuming read)
- ring_buffer_remote(): Creates a read-only ring-buffer from a
- ring_buffer_remote.
+ ring_buffer_reader(): Creates a read-only ring-buffer from a
+ ring_buffer_writer.
-To keep the internals of `struct ring_buffer` in sync with the remote,
+To keep the internals of `struct ring_buffer` in sync with the writer,
the meta-page is used. It was originally introduced to enable user-space
mapping of the ring-buffer [1]. In this case, the kernel is not the
producer anymore but the reader. The function to read that meta-page is:
- ring_buffer_poll_remote():
- Update `struct ring_buffer` based on the remote
+ ring_buffer_poll_writer():
+ Update `struct ring_buffer` based on the writer
meta-page. Wake-up readers if necessary.
The kernel has to poll the meta-page to be notified of newly written
events.
-2. Tracefs
-----------
+2. Tracefs interface
+--------------------
-This series introduce a new trace_remote that does the link between
-tracefs and the remote ring-buffer.
+The interface is a hyp/ folder at the root of the tracefs mount point.
+This folder is like an instance and you'll find there a subset of the
+regular Tracefs user-space interface:
-The interface is found in the remotes/ directory at the root of the
-tracefs mount point. Each remote is like an instance and you'll find
-there a subset of the regular Tracefs user-space interface:
+ hyp/
+ buffer_size_kb
+ trace_pipe
+ trace_pipe_raw
+ trace
+ per_cpu/
+ cpuX/
+ trace_pipe
+ trace_pipe_raw
+ events/
+ hyp/
+ hyp_enter/
+ enable
+ id
- remotes/test
- |-- buffer_size_kb
- |-- events
- | |-- enable
- | |-- header_event
- | |-- header_page
- | `-- test
- | `-- selftest
- | |-- enable
- | |-- format
- | `-- id
- |-- per_cpu
- | `-- cpu0
- | |-- trace
- | `-- trace_pipe
- |-- trace
- |-- trace_pipe
- |-- tracing_on
+Behind the scenes, kvm/hyp_trace.c must rebuild the tracing hierarchy
+without relying on kernel/trace/trace.c. This is due to fundamental
+differences:
-Behind the scenes, kernel/trace/trace_remote.c creates this tracefs
-hierarchy without relying on kernel/trace/trace.c. This is due to
-fundamental differences:
-
- * Remote tracing doesn't support trace_array's system-specific
+ * Hypervisor tracing doesn't support trace_array's system-specific
features (snapshots, tracers, etc.).
- * Logged event formats differ (e.g., no PID for remote events).
+ * Logged event formats differ (e.g., no PID in hypervisor
+ events).
- * Buffer operations require specific remote interactions.
+ * Buffer operations require specific hypervisor interactions.
-3. Simple Ring-Buffer
----------------------
-
-As the current ring-buffer.c implementation has too many dependencies to
-be used directly by the pKVM hypervisor. A new simple implementation is
-created and can be found in kernel/trace/simple-ring-buffer.c.
-
-This implementation is write-only and is used by both the pKVM
-hypervisor and a trace_remote test module.
-
-4. Events
+3. Events
---------
-A new REMOTE_EVENT() macro is added to simplify the creation of events
-on the kernel side. As remote tracing buffer are read only, only the
-event structure and a way of printing must be declared. The prototype of
-the macro is very similar to the well-known TRACE_EVENT()
+In the hypervisor, "hyp events" can be generated with trace_<event_name>
+in a similar fashion to what the kernel does. They're also created with
+similar macros than the kernel (see kvm_hypevents.h)
- REMOTE_EVENT(my_event, id,
- RE_STRUCT(
- re_field(u64, foobar)
- ),
- RE_PRINTK("foobar=%lld", __entry->foobar)
- )
- )
+HYP_EVENT("foboar",
+ HE_PROTO(void),
+ HE_STRUCT(),
+ HE_ASSIGN(),
+ HE_PRINTK(" ")
+)
-5. pKVM
--------
+Despite the apparent similarities with TRACE_EVENT(), those macros
+internally differs: they must be used in parallel between the hypervisor
+(for the writing part) and the kernel (for the reading part) which makes
+it difficult to share anything with their kernel counterpart.
-The pKVM support simply creates a "hypervisor" trace_remote on the
-kernel side and inherits from simple-ring-buffer.c on the hypervisor
-side.
+Also, events directory isn't using eventfs.
-A new event macro is created HYP_EVENT() that is under the hood re-using
-REMOTE_EVENT() (defined in the previous paragaph) as well as generate
-hypervisor specific struct and trace_<event>() functions.
+4. Few limitations:
+-------------------
-5. Limitations:
----------------
+Non consuming reading of the buffer isn't supported (i.e. cat trace) due
+to the lack of support in the ring-buffer meta-page.
-Non-consuming reading of the buffer isn't supported (i.e. cat trace ->
--EPERM) due to current the lack of support in the ring-buffer meta-page.
+Tracing must be stopped for the buffer to be reset. i.e. (echo 0 >
+tracing_on; echo 0 > trace)
[1] https://tracingsummit.org/ts/2022/hypervisortracing/
[2] https://lore.kernel.org/all/20240510140435.3550353-1-vdonnefort@google.com/
-changes since v11 (https://lore.kernel.org/all/20260131132848.254084-1-vdonnefort@google.com/)
+Vincent Donnefort (11):
+ ring-buffer: Check for empty ring-buffer with rb_num_of_entries()
+ ring-buffer: Introducing ring-buffer writer
+ ring-buffer: Expose buffer_data_page material
+ timekeeping: Export the boot clock in snapshots
+ KVM: arm64: Support unaligned fixmap in the nVHE hyp
+ KVM: arm64: Add clock support in the nVHE hyp
+ KVM: arm64: Add tracing support for the pKVM hyp
+ KVM: arm64: Add hyp tracing to tracefs
+ KVM: arm64: Add raw interface for hyp tracefs
+ KVM: arm64: Add support for hyp events
+ KVM: arm64: Add kselftest for tracefs hyp tracefs
- - Fix kerneldoc (Steven)
- - Remove useless ring_buffer_event_data type cast (Steven)
- - Fix __free_ring_buffer_iter() (Steven)
- - Move trace seq locking into start/stop (Steven)
-
-changes since v10 (https://lore.kernel.org/all/20260126104419.1649811-1-vdonnefort@google.com/)
-
- - Move kerneldoc to .c files (Steven)
- - Return EBUSY on buffer_size_kb write if buffer is loaded (Steven)
- - Remove rb_iter/rb_iters union in trace_remote_iterator (Steven)
- - Rename a refactor trace file seq_operations (Steven)
- - Make trace_get_cpu() accessible to trace_remote.c (Steven)
- - Remove unnecessary cpus_read_unlock() (Steven)
- - !preempt on remote_test driver buffer writing (Steven)
- - Do not fail selftest if cpu/online is unavailable (Steven)
- - Add rational for trace_remote into documentation (Steven)
-
-changes since v9 (https://lore.kernel.org/all/20251202093623.2337860-1-vdonnefort@google.com/)
-
- - Add vCPU PID to hyp_enter/hyp_exit (Marc)
- - Remove useless X1 setting for tracing HVCs (Marc)
- - Fix REMOTE_PRINTK_COUNT_ARGS()
- - Rebase on 6.19-rc7
-
-Changes since v8 (https://lore.kernel.org/all/20251107093840.3779150-1-vdonnefort@google.com/)
-
- - Do not enable tracing if unstable cnvct (Marc)
- - Add support for nVHE (Marc)
- - Add PKVM_DISABLE_STAGE2_ON_PANIC (Marc)
- - NVHE_EL2_TRACING depends on NVHE_EL2_DEBUG (Marc)
- - Add a reason for hyp_enter/hyp_exit events (Marc)
- - Remove PKVM_SELFTESTS in favour of NVHE_EL2_DEBUG
- - Add wrapper for arm_smccc_1_2, now used in nvhe/ffa.c
-
-Changes since v7 (https://lore.kernel.org/all/20251003133825.2068970-1-vdonnefort@google.com/)
-
- - Add missing EXPORT_SYMBOL_GPL for remote_test.ko
- - Rebase on 6.18-rc4
-
-Changes since v6 (https://lore.kernel.org/all/20250821081412.1008261-1-vdonnefort@google.com/)
-
- - Add requires field to the selftest (Masami)
- - Use guard() for ring_buffer_poll_remote (Steven)
- - Rename ring_buffer_remote() to ring_buffer_alloc_remote() (Steven)
- - kerneldoc for trace_buffer_remote and simple_ring_buffer (Steven)
- - Validate trace_buffer_desc size in trace_remote_alloc_buffer
- (Steven)
- - Add non-consuming ring-buffer read (Steven)
- - Add spinning failsafe in simple_ring_buffer (Steven)
- - Range check for hyp_trace_desc::bpages_backing_* in hyp_trace_desc_validate()
- - unsigned int cpu in hyp_trace_desc_validate()
- - Fix event/format file
- - Add tests with an offline CPU
- - Add tests for non-consuming read
- - Add documentation
- - Rebase on 6.17
-
-Changes since v5 (https://lore.kernel.org/all/20250516134031.661124-1-vdonnefort@google.com/)
-
- - Add tishift lib to the hyp (Aneesh)
- - Rebase on 6.17-rc2
-
-Changes since v4 (https://lore.kernel.org/all/20250506164820.515876-1-vdonnefort@google.com/)
-
- - Extend meta-page with pages_touched and pages_lost
- - Create ring_buffer_types.h
- - Fix simple_ring_buffer build for 32-bits arch and x86
- - Try unload buffer on reset (+ test)
- - Minor renaming and comments
-
-Changes since v3 (https://lore.kernel.org/all/20250224121353.98697-1-vdonnefort@google.com/)
-
- - Move tracefs support from kvm/hyp_trace.c into a generic trace_remote.c.
- - Move ring-buffer implementation from nvhe/trace.c into a generic
- simple-ring-buffer.c
- - Rebase on 6.15-rc4.
-
-Changes since v2 (https://lore.kernel.org/all/20250108114536.627715-1-vdonnefort@google.com/)
-
- - Fix ring-buffer remote reset
- - Fix fast-forward in rb_page_desc()
- - Refactor nvhe/trace.c
- - struct hyp_buffer_page more compact
- - Add a struct_len to trace_page_desc
- - Extend reset testing
- - Rebase on 6.14-rc3
-
-Changes since v1 (https://lore.kernel.org/all/20240911093029.3279154-1-vdonnefort@google.com/)
-
- - Add 128-bits mult fallback in the unlikely event of an overflow. (John)
- - Fix ELF section sort.
- - __always_inline trace_* event macros.
- - Fix events/<event>/enable permissions.
- - Rename ring-buffer "writer" to "remote".
- - Rename CONFIG_PROTECTED_NVHE_TESTING to PKVM_SELFTEST to align with
- Quentin's upcoming selftest
- - Rebase on 6.13-rc3.
-
-Changes since RFC (https://lore.kernel.org/all/20240805173234.3542917-1-vdonnefort@google.com/)
-
- - hypervisor trace clock:
- - mult/shift computed in hyp_trace.c. (John)
- - Update clock when it deviates from kernel boot clock. (John)
- - Add trace_clock file.
- - Separate patch for better readability.
- - Add a proper reset interface which does not need to teardown the
- tracing buffers. (Steven)
- - Return -EPERM on trace access. (Steven)
- - Add per-cpu trace file.
- - Automatically teardown and free the tracing buffer when it is empty,
- without readers and not currently tracing.
- - Show in buffer_size_kb if the buffer is loaded in the hypervisor or
- not.
- - Extend tests to cover reset and unload.
- - CC timekeeping folks on relevant patches (Marc)
-
-Vincent Donnefort (30):
- ring-buffer: Add page statistics to the meta-page
- ring-buffer: Store bpage pointers into subbuf_ids
- ring-buffer: Introduce ring-buffer remotes
- ring-buffer: Add non-consuming read for ring-buffer remotes
- tracing: Introduce trace remotes
- tracing: Add reset to trace remotes
- tracing: Add non-consuming read to trace remotes
- tracing: Add init callback to trace remotes
- tracing: Add events to trace remotes
- tracing: Add events/ root files to trace remotes
- tracing: Add helpers to create trace remote events
- ring-buffer: Export buffer_data_page and macros
- tracing: Introduce simple_ring_buffer
- tracing: Add a trace remote module for testing
- tracing: selftests: Add trace remote tests
- Documentation: tracing: Add tracing remotes
- tracing: load/unload page callbacks for simple_ring_buffer
- tracing: Check for undefined symbols in simple_ring_buffer
- KVM: arm64: Add PKVM_DISABLE_STAGE2_ON_PANIC
- KVM: arm64: Add clock support to nVHE/pKVM hyp
- KVM: arm64: Initialise hyp_nr_cpus for nVHE hyp
- KVM: arm64: Support unaligned fixmap in the pKVM hyp
- KVM: arm64: Add tracing capability for the nVHE/pKVM hyp
- KVM: arm64: Add trace remote for the nVHE/pKVM hyp
- KVM: arm64: Sync boot clock with the nVHE/pKVM hyp
- KVM: arm64: Add trace reset to the nVHE/pKVM hyp
- KVM: arm64: Add event support to the nVHE/pKVM hyp and trace remote
- KVM: arm64: Add hyp_enter/hyp_exit events to nVHE/pKVM hyp
- KVM: arm64: Add selftest event support to nVHE/pKVM hyp
- tracing: selftests: Add hypervisor trace remote tests
-
- Documentation/trace/index.rst | 11 +
- Documentation/trace/remotes.rst | 66 +
- arch/arm64/include/asm/kvm_asm.h | 8 +
- arch/arm64/include/asm/kvm_define_hypevents.h | 16 +
- arch/arm64/include/asm/kvm_host.h | 3 +
- arch/arm64/include/asm/kvm_hyp.h | 4 +-
- arch/arm64/include/asm/kvm_hypevents.h | 60 +
- arch/arm64/include/asm/kvm_hyptrace.h | 26 +
- arch/arm64/kernel/image-vars.h | 4 +
- arch/arm64/kernel/vmlinux.lds.S | 18 +
- arch/arm64/kvm/Kconfig | 64 +-
- arch/arm64/kvm/Makefile | 2 +
- arch/arm64/kvm/arm.c | 12 +-
- arch/arm64/kvm/handle_exit.c | 2 +-
- arch/arm64/kvm/hyp/include/nvhe/arm-smccc.h | 23 +
- arch/arm64/kvm/hyp/include/nvhe/clock.h | 16 +
- .../kvm/hyp/include/nvhe/define_events.h | 14 +
- arch/arm64/kvm/hyp/include/nvhe/mem_protect.h | 2 -
- arch/arm64/kvm/hyp/include/nvhe/trace.h | 70 +
- arch/arm64/kvm/hyp/nvhe/Makefile | 6 +-
- arch/arm64/kvm/hyp/nvhe/clock.c | 65 +
- arch/arm64/kvm/hyp/nvhe/events.c | 25 +
- arch/arm64/kvm/hyp/nvhe/ffa.c | 28 +-
- arch/arm64/kvm/hyp/nvhe/host.S | 2 +-
- arch/arm64/kvm/hyp/nvhe/hyp-main.c | 87 +-
- arch/arm64/kvm/hyp/nvhe/hyp.lds.S | 6 +
- arch/arm64/kvm/hyp/nvhe/mm.c | 4 +-
- arch/arm64/kvm/hyp/nvhe/psci-relay.c | 7 +-
- arch/arm64/kvm/hyp/nvhe/setup.c | 4 +-
- arch/arm64/kvm/hyp/nvhe/stacktrace.c | 6 +-
- arch/arm64/kvm/hyp/nvhe/switch.c | 5 +-
- arch/arm64/kvm/hyp/nvhe/trace.c | 306 ++++
- arch/arm64/kvm/hyp_trace.c | 443 ++++++
- arch/arm64/kvm/hyp_trace.h | 11 +
- arch/arm64/kvm/stacktrace.c | 8 +-
- fs/tracefs/inode.c | 1 +
- include/linux/ring_buffer.h | 58 +
- include/linux/ring_buffer_types.h | 41 +
- include/linux/simple_ring_buffer.h | 65 +
- include/linux/trace_remote.h | 48 +
- include/linux/trace_remote_event.h | 33 +
- include/trace/define_remote_events.h | 73 +
- include/uapi/linux/trace_mmap.h | 8 +-
- kernel/trace/Kconfig | 14 +
- kernel/trace/Makefile | 20 +
- kernel/trace/remote_test.c | 261 ++++
- kernel/trace/remote_test_events.h | 10 +
- kernel/trace/ring_buffer.c | 356 ++++-
- kernel/trace/simple_ring_buffer.c | 517 +++++++
- kernel/trace/trace.c | 4 +-
- kernel/trace/trace.h | 7 +
- kernel/trace/trace_remote.c | 1368 +++++++++++++++++
- .../ftrace/test.d/remotes/buffer_size.tc | 25 +
- .../selftests/ftrace/test.d/remotes/functions | 88 ++
- .../test.d/remotes/hypervisor/buffer_size.tc | 11 +
- .../ftrace/test.d/remotes/hypervisor/reset.tc | 11 +
- .../ftrace/test.d/remotes/hypervisor/trace.tc | 11 +
- .../test.d/remotes/hypervisor/trace_pipe.tc | 11 +
- .../test.d/remotes/hypervisor/unloading.tc | 11 +
- .../selftests/ftrace/test.d/remotes/reset.tc | 90 ++
- .../selftests/ftrace/test.d/remotes/trace.tc | 127 ++
- .../ftrace/test.d/remotes/trace_pipe.tc | 127 ++
- .../ftrace/test.d/remotes/unloading.tc | 41 +
- 63 files changed, 4751 insertions(+), 120 deletions(-)
- create mode 100644 Documentation/trace/remotes.rst
+ arch/arm64/include/asm/kvm_asm.h | 6 +
+ arch/arm64/include/asm/kvm_define_hypevents.h | 60 ++
+ arch/arm64/include/asm/kvm_hyp.h | 6 +
+ arch/arm64/include/asm/kvm_hypevents.h | 41 +
+ arch/arm64/include/asm/kvm_hypevents_defs.h | 41 +
+ arch/arm64/include/asm/kvm_hyptrace.h | 38 +
+ arch/arm64/kernel/image-vars.h | 4 +
+ arch/arm64/kernel/vmlinux.lds.S | 18 +
+ arch/arm64/kvm/Kconfig | 9 +
+ arch/arm64/kvm/Makefile | 2 +
+ arch/arm64/kvm/arm.c | 6 +
+ arch/arm64/kvm/hyp/hyp-constants.c | 4 +
+ arch/arm64/kvm/hyp/include/nvhe/arm-smccc.h | 13 +
+ arch/arm64/kvm/hyp/include/nvhe/clock.h | 15 +
+ .../kvm/hyp/include/nvhe/define_events.h | 21 +
+ arch/arm64/kvm/hyp/include/nvhe/trace.h | 55 ++
+ arch/arm64/kvm/hyp/nvhe/Makefile | 1 +
+ arch/arm64/kvm/hyp/nvhe/clock.c | 42 +
+ arch/arm64/kvm/hyp/nvhe/events.c | 35 +
+ arch/arm64/kvm/hyp/nvhe/ffa.c | 2 +-
+ arch/arm64/kvm/hyp/nvhe/hyp-main.c | 64 ++
+ arch/arm64/kvm/hyp/nvhe/hyp.lds.S | 4 +
+ arch/arm64/kvm/hyp/nvhe/mm.c | 2 +-
+ arch/arm64/kvm/hyp/nvhe/psci-relay.c | 14 +-
+ arch/arm64/kvm/hyp/nvhe/switch.c | 5 +-
+ arch/arm64/kvm/hyp/nvhe/trace.c | 594 ++++++++++++
+ arch/arm64/kvm/hyp_events.c | 164 ++++
+ arch/arm64/kvm/hyp_trace.c | 854 ++++++++++++++++++
+ arch/arm64/kvm/hyp_trace.h | 15 +
+ include/linux/ring_buffer.h | 124 ++-
+ include/linux/timekeeping.h | 6 +
+ kernel/time/timekeeping.c | 9 +
+ kernel/trace/ring_buffer.c | 244 +++--
+ tools/testing/selftests/hyp-trace/Makefile | 6 +
+ tools/testing/selftests/hyp-trace/config | 4 +
+ .../selftests/hyp-trace/hyp-trace-test | 161 ++++
+ 36 files changed, 2591 insertions(+), 98 deletions(-)
create mode 100644 arch/arm64/include/asm/kvm_define_hypevents.h
create mode 100644 arch/arm64/include/asm/kvm_hypevents.h
+ create mode 100644 arch/arm64/include/asm/kvm_hypevents_defs.h
create mode 100644 arch/arm64/include/asm/kvm_hyptrace.h
create mode 100644 arch/arm64/kvm/hyp/include/nvhe/arm-smccc.h
create mode 100644 arch/arm64/kvm/hyp/include/nvhe/clock.h
@@ -350,31 +171,15 @@
create mode 100644 arch/arm64/kvm/hyp/nvhe/clock.c
create mode 100644 arch/arm64/kvm/hyp/nvhe/events.c
create mode 100644 arch/arm64/kvm/hyp/nvhe/trace.c
+ create mode 100644 arch/arm64/kvm/hyp_events.c
create mode 100644 arch/arm64/kvm/hyp_trace.c
create mode 100644 arch/arm64/kvm/hyp_trace.h
- create mode 100644 include/linux/ring_buffer_types.h
- create mode 100644 include/linux/simple_ring_buffer.h
- create mode 100644 include/linux/trace_remote.h
- create mode 100644 include/linux/trace_remote_event.h
- create mode 100644 include/trace/define_remote_events.h
- create mode 100644 kernel/trace/remote_test.c
- create mode 100644 kernel/trace/remote_test_events.h
- create mode 100644 kernel/trace/simple_ring_buffer.c
- create mode 100644 kernel/trace/trace_remote.c
- create mode 100644 tools/testing/selftests/ftrace/test.d/remotes/buffer_size.tc
- create mode 100644 tools/testing/selftests/ftrace/test.d/remotes/functions
- create mode 100644 tools/testing/selftests/ftrace/test.d/remotes/hypervisor/buffer_size.tc
- create mode 100644 tools/testing/selftests/ftrace/test.d/remotes/hypervisor/reset.tc
- create mode 100644 tools/testing/selftests/ftrace/test.d/remotes/hypervisor/trace.tc
- create mode 100644 tools/testing/selftests/ftrace/test.d/remotes/hypervisor/trace_pipe.tc
- create mode 100644 tools/testing/selftests/ftrace/test.d/remotes/hypervisor/unloading.tc
- create mode 100644 tools/testing/selftests/ftrace/test.d/remotes/reset.tc
- create mode 100644 tools/testing/selftests/ftrace/test.d/remotes/trace.tc
- create mode 100644 tools/testing/selftests/ftrace/test.d/remotes/trace_pipe.tc
- create mode 100644 tools/testing/selftests/ftrace/test.d/remotes/unloading.tc
+ create mode 100644 tools/testing/selftests/hyp-trace/Makefile
+ create mode 100644 tools/testing/selftests/hyp-trace/config
+ create mode 100644 tools/testing/selftests/hyp-trace/hyp-trace-test
-base-commit: 2b7a25df823dc7d8f56f8ce7c2d2dac391cea9c2
+base-commit: e4fc196f5ba36eb7b9758cf2c73df49a44199895
--
-2.53.0.335.g19a08e0c02-goog
+2.46.0.rc2.264.g509ed76dc8-goog