Thread (79 messages) 79 messages, 6 authors, 2025-06-24

Re: [PATCH v10 07/14] unwind_user/deferred: Make unwind deferral requests NMI-safe

From: Steven Rostedt <rostedt@goodmis.org>
Date: 2025-06-19 09:07:15
Also in: bpf, lkml


On June 19, 2025 4:57:17 AM EDT, Peter Zijlstra [off-list ref] wrote:
On Tue, Jun 10, 2025 at 08:54:28PM -0400, Steven Rostedt wrote:
quoted
+		info->nmi_timestamp = local_clock();
+		*timestamp = info->nmi_timestamp;
+		inited_timestamp = true;
+	}
+
+	if (info->pending)
+		return 1;
+
+	ret = task_work_add(current, &info->work, TWA_NMI_CURRENT);
+	if (ret < 0) {
+		/*
+		 * If this set nmi_timestamp and is not using it,
+		 * there's no guarantee that it will be used.
+		 * Set it back to zero.
+		 */
+		if (inited_timestamp)
+			info->nmi_timestamp = 0;
+		return ret;
+	}
+
+	info->pending = 1;
+
+	return 0;
+}
+
 /**
  * unwind_deferred_request - Request a user stacktrace on task exit
  * @work: Unwind descriptor requesting the trace
@@ -139,31 +207,38 @@ static void unwind_deferred_task_work(struct callback_head *head)
 int unwind_deferred_request(struct unwind_work *work, u64 *timestamp)
 {
 	struct unwind_task_info *info = &current->unwind_info;
+	int pending;
 	int ret;
 
 	*timestamp = 0;
 
 	if ((current->flags & (PF_KTHREAD | PF_EXITING)) ||
 	    !user_mode(task_pt_regs(current)))
 		return -EINVAL;
 
+	if (in_nmi())
+		return unwind_deferred_request_nmi(work, timestamp);
So nested NMI is a thing -- AFAICT this is broken in the face of nested
NMI.

Specifically, we mark all exceptions that can happen with IRQs disabled
as NMI like (so that they don't go about taking locks etc.).

So imagine you're in #DB, you're asking for an unwind, you do the above
dance and get hit with NMI.
Does #DB make in_nmi() true? If that's the case then we do need to handle that.

-- Steve 
Then you get the NMI setting nmi_timestamp, and #DB overwriting it with
a later value, and you're back up the creek without no paddles.


Mix that with local_clock() that is only monotonic on a single CPU. And
you ask for an unwind on CPU0, get migrated to CPU1 which for the
argument will be behind, and see a timestamp 'far' in the future.
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help