Thread (43 messages) 43 messages, 2 authors, 1d ago

[PATCH 01/29] perf session: Add minimum event size and alignment validation

From: Arnaldo Carvalho de Melo <acme@kernel.org>
Date: 2026-05-24 03:27:31
Also in: lkml
Subsystem: performance events subsystem, the rest · Maintainers: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo, Namhyung Kim, Linus Torvalds

From: Arnaldo Carvalho de Melo <redacted>

Add a per-type minimum size table (perf_event__min_size[]) and
enforce it before swap and processing, so that both cross-endian
and native-endian paths are protected from accessing fields past
the event boundary.

The table uses offsetof() for types with trailing variable-length
fields (filenames, strings, msg arrays) and sizeof() for
fixed-size types.  Zero entries mean no minimum beyond the 8-byte
header already enforced by the reader.

Undersized events are skipped with a warning in process_event
and rejected in peek_event — both checked before the swap
handler runs, preventing OOB access on crafted event fields.

Also reject events whose header.size is not 8-byte aligned.  The
kernel aligns all event sizes to sizeof(u64) — see
perf_event_comm_event() (ALIGN), perf_event_mmap_event(),
perf_event_cgroup(), perf_event_ksymbol() (IS_ALIGNED loops),
and perf_event_text_poke() (ALIGN) in kernel/events/core.c.
An unaligned size means the file is corrupted or crafted; reject
early so downstream code that divides by sizeof(u64) to compute
array element counts gets exact results.

Three legacy user events are exempted from the alignment check:
TRACING_DATA (66) had a 12-byte struct before commit b39c915a4f36
("libperf event: Ensure tracing data is multiple of 8 sized")
added padding, COMPRESSED (81) carries raw ZSTD output (already
superseded by COMPRESSED2 with PERF_ALIGN), and HEADER_FEATURE
(80) uses do_write_string() with a 4-byte length prefix.

Also guard event_swap() against crafted event types >=
PERF_RECORD_HEADER_MAX to prevent OOB reads on the
perf_event__swap_ops[] array.

Changes in v2:
- Fix double-skip for unsupported event types: return 0 instead
  of event->header.size in perf_session__process_event() for
  HEADER_MAX, since reader__read_event() already advances by
  event->header.size (Reported-by: sashiko-bot@kernel.org)
- Exempt TRACING_DATA, COMPRESSED, and HEADER_FEATURE from the
  alignment check — these legacy user events predate the 8-byte
  alignment rule (Reported-by: sashiko-bot@kernel.org)
- peek_event: return 0 (skip) for unknown event types instead of
  -1 (error), consistent with process_event which already skips
  unsupported types gracefully (Reported-by: sashiko-bot@kernel.org)

Reported-by: sashiko-bot@kernel.org # Running on a local machine
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Assisted-by: Claude Opus 4.6 (1M context) [off-list ref]
Signed-off-by: Arnaldo Carvalho de Melo <redacted>
---
 tools/perf/util/session.c | 253 +++++++++++++++++++++++++++++++++-----
 1 file changed, 220 insertions(+), 33 deletions(-)
diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
index 1e25892963b7857a..0523fd243e02c09b 100644
--- a/tools/perf/util/session.c
+++ b/tools/perf/util/session.c
@@ -1759,15 +1759,121 @@ int perf_session__deliver_synth_attr_event(struct perf_session *session,
 	return perf_session__deliver_synth_event(session, &ev.ev, NULL);
 }
 
-static void event_swap(union perf_event *event, bool sample_id_all)
+/*
+ * Minimum event sizes indexed by type.  Checked before swap and
+ * processing so that both cross-endian and native-endian paths
+ * are protected from accessing fields past the event boundary.
+ * Zero means no minimum beyond the 8-byte header (already
+ * enforced by the reader).
+ */
+static const u32 perf_event__min_size[PERF_RECORD_HEADER_MAX] = {
+	/*
+	 * offsetof() + 1 for types with a trailing variable-length
+	 * string (filename, comm, path, name, msg): the +1 ensures
+	 * room for at least a null terminator.  Full null-termination
+	 * within the event boundary is checked separately.
+	 *
+	 * PERF_RECORD_SAMPLE is omitted: all64_swap is bounded by
+	 * header.size, and the internal layout varies by sample_type
+	 * so a fixed minimum is not meaningful.
+	 */
+	[PERF_RECORD_MMAP]		  = offsetof(struct perf_record_mmap, filename) + 1,
+	[PERF_RECORD_LOST]		  = sizeof(struct perf_record_lost),
+	[PERF_RECORD_COMM]		  = offsetof(struct perf_record_comm, comm) + 1,
+	[PERF_RECORD_EXIT]		  = sizeof(struct perf_record_fork),
+	[PERF_RECORD_THROTTLE]		  = sizeof(struct perf_record_throttle),
+	[PERF_RECORD_UNTHROTTLE]	  = sizeof(struct perf_record_throttle),
+	[PERF_RECORD_FORK]		  = sizeof(struct perf_record_fork),
+	/*
+	 * The kernel dynamically sizes PERF_RECORD_READ based on
+	 * attr.read_format  the minimum has just pid + tid + value.
+	 */
+	[PERF_RECORD_READ]		  = offsetof(struct perf_record_read, time_enabled),
+	[PERF_RECORD_MMAP2]		  = offsetof(struct perf_record_mmap2, filename) + 1,
+	[PERF_RECORD_LOST_SAMPLES]	  = sizeof(struct perf_record_lost_samples),
+	[PERF_RECORD_AUX]		  = sizeof(struct perf_record_aux),
+	[PERF_RECORD_ITRACE_START]	  = sizeof(struct perf_record_itrace_start),
+	[PERF_RECORD_SWITCH]		  = sizeof(struct perf_event_header),
+	[PERF_RECORD_SWITCH_CPU_WIDE]	  = sizeof(struct perf_record_switch),
+	[PERF_RECORD_NAMESPACES]	  = sizeof(struct perf_record_namespaces),
+	[PERF_RECORD_CGROUP]		  = offsetof(struct perf_record_cgroup, path) + 1,
+	[PERF_RECORD_TEXT_POKE]		  = sizeof(struct perf_record_text_poke_event),
+	[PERF_RECORD_KSYMBOL]		  = offsetof(struct perf_record_ksymbol, name) + 1,
+	[PERF_RECORD_BPF_EVENT]		  = sizeof(struct perf_record_bpf_event),
+	[PERF_RECORD_HEADER_ATTR]	  = sizeof(struct perf_event_header) + PERF_ATTR_SIZE_VER0,
+	[PERF_RECORD_HEADER_EVENT_TYPE]	  = sizeof(struct perf_record_header_event_type),
+	/* Legacy events predate the __u32 pad field, accept 12-byte records */
+	[PERF_RECORD_HEADER_TRACING_DATA] = offsetof(struct perf_record_header_tracing_data, pad),
+	[PERF_RECORD_AUX_OUTPUT_HW_ID]	  = sizeof(struct perf_record_aux_output_hw_id),
+	[PERF_RECORD_AUXTRACE_INFO]	  = sizeof(struct perf_record_auxtrace_info),
+	[PERF_RECORD_AUXTRACE]		  = sizeof(struct perf_record_auxtrace),
+	[PERF_RECORD_AUXTRACE_ERROR]	  = offsetof(struct perf_record_auxtrace_error, msg) + 1,
+	[PERF_RECORD_THREAD_MAP]	  = sizeof(struct perf_record_thread_map),
+	/* Smallest valid variant is RANGE_CPUS: header(8) + type(2) + range(6) */
+	[PERF_RECORD_CPU_MAP]		  = sizeof(struct perf_event_header) +
+					    sizeof(__u16) +
+					    sizeof(struct perf_record_range_cpu_map),
+	[PERF_RECORD_STAT_CONFIG]	  = sizeof(struct perf_record_stat_config),
+	[PERF_RECORD_STAT]		  = sizeof(struct perf_record_stat),
+	[PERF_RECORD_STAT_ROUND]	  = sizeof(struct perf_record_stat_round),
+	/* Union inflates sizeof; use fixed header fields as minimum */
+	[PERF_RECORD_EVENT_UPDATE]	  = offsetof(struct perf_record_event_update, scale),
+	[PERF_RECORD_TIME_CONV]		  = offsetof(struct perf_record_time_conv, time_cycles),
+	[PERF_RECORD_ID_INDEX]		  = sizeof(struct perf_record_id_index),
+	[PERF_RECORD_HEADER_BUILD_ID]	  = sizeof(struct perf_record_header_build_id),
+	[PERF_RECORD_HEADER_FEATURE]	  = sizeof(struct perf_record_header_feature),
+	[PERF_RECORD_COMPRESSED2]	  = sizeof(struct perf_record_compressed2),
+	[PERF_RECORD_BPF_METADATA]	  = sizeof(struct perf_record_bpf_metadata),
+	[PERF_RECORD_CALLCHAIN_DEFERRED]  = sizeof(struct perf_event_header) + sizeof(__u64),
+	/*
+	 * SCHEDSTAT events have a version-dependent union after the
+	 * fixed header fields; the minimum is the base (pre-union)
+	 * portion so old and new versions both pass.
+	 */
+	[PERF_RECORD_SCHEDSTAT_CPU]	  = offsetof(struct perf_record_schedstat_cpu, v15),
+	[PERF_RECORD_SCHEDSTAT_DOMAIN]	  = offsetof(struct perf_record_schedstat_domain, v15),
+};
+
+/*
+ * Return true if the event is too small for its declared type.
+ * Caller must ensure event->header.type < PERF_RECORD_HEADER_MAX.
+ * If min is non-NULL, stores the required minimum on failure.
+ */
+static bool perf_event__too_small(const union perf_event *event, u32 *min)
 {
-	perf_event__swap_op swap;
+	u32 min_sz = perf_event__min_size[event->header.type];
+
+	if (min_sz && event->header.size < min_sz) {
+		if (min)
+			*min = min_sz;
+		return true;
+	}
 
-	swap = perf_event__swap_ops[event->header.type];
+	return false;
+}
+
+/* Caller must ensure event->header.type < PERF_RECORD_HEADER_MAX */
+static void event_swap(union perf_event *event, bool sample_id_all)
+{
+	perf_event__swap_op swap = perf_event__swap_ops[event->header.type];
 	if (swap)
 		swap(event, sample_id_all);
 }
 
+/*
+ * Read and validate the event at @file_offset.
+ *
+ * Returns:
+ *   0   success: *event_ptr is set and safe to access.
+ *  -1   error; check *event_ptr to decide whether to advance or abort:
+ *          *event_ptr set   event header was read but the event is
+ *                            malformed (too small for its type, or byte-swap
+ *                            failed).  header.size is still valid, so the
+ *                            caller can advance past the event.
+ *          *event_ptr NULL  fatal: couldn't read the header at all
+ *                            (I/O error, offset out of range, pipe mode).
+ *                            Caller must abort.
+ */
 int perf_session__peek_event(struct perf_session *session, off_t file_offset,
 			     void *buf, size_t buf_sz,
 			     union perf_event **event_ptr,
@@ -1775,52 +1881,85 @@ int perf_session__peek_event(struct perf_session *session, off_t file_offset,
 {
 	union perf_event *event;
 	size_t hdr_sz, rest;
+	u32 min_sz;
 	int fd;
 
+	*event_ptr = NULL;
+
 	if (session->one_mmap && !session->header.needs_swap) {
 		event = file_offset - session->one_mmap_offset +
 			session->one_mmap_addr;
-		goto out_parse_sample;
-	}
 
-	if (perf_data__is_pipe(session->data))
-		return -1;
+		/* Every event must at least contain its own header */
+		if (event->header.size < sizeof(struct perf_event_header))
+			return -1;
+	} else {
+		if (perf_data__is_pipe(session->data))
+			return -1;
 
-	fd = perf_data__fd(session->data);
-	hdr_sz = sizeof(struct perf_event_header);
+		fd = perf_data__fd(session->data);
+		hdr_sz = sizeof(struct perf_event_header);
 
-	if (buf_sz < hdr_sz)
-		return -1;
+		if (buf_sz < hdr_sz)
+			return -1;
 
-	if (lseek(fd, file_offset, SEEK_SET) == (off_t)-1 ||
-	    readn(fd, buf, hdr_sz) != (ssize_t)hdr_sz)
-		return -1;
+		if (lseek(fd, file_offset, SEEK_SET) == (off_t)-1 ||
+		    readn(fd, buf, hdr_sz) != (ssize_t)hdr_sz)
+			return -1;
 
-	event = (union perf_event *)buf;
+		event = (union perf_event *)buf;
 
-	if (session->header.needs_swap)
-		perf_event_header__bswap(&event->header);
+		if (session->header.needs_swap)
+			perf_event_header__bswap(&event->header);
+
+		if (event->header.size < hdr_sz || event->header.size > buf_sz)
+			return -1;
+
+		buf += hdr_sz;
+		rest = event->header.size - hdr_sz;
+
+		if (readn(fd, buf, rest) != (ssize_t)rest)
+			return -1;
+	}
 
-	if (event->header.size < hdr_sz || event->header.size > buf_sz)
+	/* Event data is fully loaded — expose so callers can advance */
+	*event_ptr = event;
+
+	/*
+	 * Check alignment before type: an unaligned size misaligns the
+	 * stream for all subsequent reads regardless of event type.
+	 * Three legacy user events predate the 8-byte rule  exempt them.
+	 */
+	if (event->header.size % sizeof(u64) &&
+	    event->header.type != PERF_RECORD_HEADER_TRACING_DATA &&
+	    event->header.type != PERF_RECORD_COMPRESSED &&
+	    event->header.type != PERF_RECORD_HEADER_FEATURE) {
+		pr_warning("WARNING: peek_event: event type %u size %u not aligned to %zu\n",
+			   event->header.type,
+			   event->header.size, sizeof(u64));
 		return -1;
+	}
 
-	buf += hdr_sz;
-	rest = event->header.size - hdr_sz;
+	if (event->header.type >= PERF_RECORD_HEADER_MAX) {
+		pr_warning("WARNING: peek_event: unsupported event type %u, skipping\n",
+			   event->header.type);
+		return 0;
+	}
 
-	if (readn(fd, buf, rest) != (ssize_t)rest)
+	if (perf_event__too_small(event, &min_sz)) {
+		pr_warning("WARNING: peek_event: %s event size %u too small (min %u)\n",
+			   perf_event__name(event->header.type),
+			   event->header.size, min_sz);
 		return -1;
+	}
 
 	if (session->header.needs_swap)
 		event_swap(event, evlist__sample_id_all(session->evlist));
 
-out_parse_sample:
-
 	if (sample && event->header.type < PERF_RECORD_USER_TYPE_START &&
 	    evlist__parse_sample(session->evlist, event, sample))
 		return -1;
 
-	*event_ptr = event;
-
 	return 0;
 }
 
@@ -1858,23 +1997,71 @@ static s64 perf_session__process_event(struct perf_session *session,
 {
 	struct evlist *evlist = session->evlist;
 	const struct perf_tool *tool = session->tool;
+	u32 min_sz;
 	int ret;
 
-	if (session->header.needs_swap)
-		event_swap(event, evlist__sample_id_all(evlist));
+	/*
+	 * The kernel aligns all event sizes to sizeof(u64)  see
+	 * perf_event_comm_event() (ALIGN), perf_event_mmap_event(),
+	 * perf_event_cgroup(), perf_event_ksymbol() (IS_ALIGNED loops),
+	 * and perf_event_text_poke() (ALIGN) in kernel/events/core.c.
+	 *
+	 * An unaligned size means the file is corrupted or crafted.
+	 * Abort: there is no point continuing to read unaligned records
+	 * because the caller advances rd->head by event->header.size,
+	 * so every subsequent read would start at a misaligned offset,
+	 * producing garbage headers for the rest of the file.
+	 *
+	 * Exempt three legacy user events that predate the alignment rule:
+	 *
+	 * TRACING_DATA (66): struct tracing_data_event was 12 bytes before
+	 *   b39c915a4f36 ("libperf event: Ensure tracing data is multiple
+	 *   of 8 sized") added __u32 pad; old perf.data files still contain
+	 *   12-byte records.
+	 *   TODO: introduce HEADER_TRACING_DATA2 with guaranteed alignment.
+	 *
+	 * COMPRESSED (81): raw ZSTD output, arbitrary length.  Already
+	 *   superseded by COMPRESSED2 (83) with PERF_ALIGN.
+	 *
+	 * HEADER_FEATURE (80): do_write_string() uses a 4-byte length
+	 *   prefix with no padding to 8-byte total.
+	 *   TODO: introduce HEADER_FEATURE2 with guaranteed alignment.
+	 */
+	if (event->header.size % sizeof(u64) &&
+	    event->header.type != PERF_RECORD_HEADER_TRACING_DATA &&
+	    event->header.type != PERF_RECORD_COMPRESSED &&
+	    event->header.type != PERF_RECORD_HEADER_FEATURE) {
+		pr_err("ERROR: %s event size %u is not 8-byte aligned, aborting\n",
+		       perf_event__name(event->header.type),
+		       event->header.size);
+		return -EINVAL;
+	}
 
 	if (event->header.type >= PERF_RECORD_HEADER_MAX) {
-		/* perf should not support unaligned event, stop here. */
-		if (event->header.size % sizeof(u64))
-			return -EINVAL;
-
 		/* This perf is outdated and does not support the latest event type. */
 		ui__warning("Unsupported header type %u, please consider updating perf.\n",
 			    event->header.type);
-		/* Skip unsupported event by returning its size. */
-		return event->header.size;
+		/*
+		 * Return 0 to skip: the caller (reader__read_event)
+		 * already advances by event->header.size.
+		 */
+		return 0;
 	}
 
+	/*
+	 * Skip rather than abort: a too-small-but-aligned event
+	 * can be safely stepped over without misaligning the stream.
+	 */
+	if (perf_event__too_small(event, &min_sz)) {
+		pr_warning("WARNING: %s event size %u too small (min %u), skipping\n",
+			   perf_event__name(event->header.type),
+			   event->header.size, min_sz);
+		return 0;
+	}
+
+	if (session->header.needs_swap)
+		event_swap(event, evlist__sample_id_all(evlist));
+
 	events_stats__inc(&evlist->stats, event->header.type);
 
 	if (event->header.type >= PERF_RECORD_USER_TYPE_START)
-- 
2.54.0
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help