Thread (148 messages) 148 messages, 17 authors, 2022-06-09

Re: [PATCH 23/35] x86/fpu: Add helpers for modifying supervisor xstate

From: "Edgecombe, Rick P" <rick.p.edgecombe@intel.com>
Date: 2022-02-12 02:31:28
Also in: linux-api, linux-arch, linux-mm, lkml

On Fri, 2022-02-11 at 16:27 -0800, Dave Hansen wrote:
On 1/30/22 13:18, Rick Edgecombe wrote:
quoted
Add helpers that can be used to modify supervisor xstate safely for
the
current task.
This should be at the end of the changelog.
Hmm, ok.
quoted
State for supervisors xstate based features can be live and
accesses via MSR's, or saved in memory in an xsave buffer. When the
kernel needs to modify this state it needs to be sure to operate on
it
in the right place, so the modifications don't get clobbered.
We tend to call these "supervisor xfeatures".  The "state is in the
registers" we call "active".  Maybe:

	Just like user xfeatures, supervisor xfeatures can be either
	active in the registers or inactive and present in the task FPU
	buffer.  If the registers are active, the registers can be
	modified directly.  If the registers are not active, the
	modification must be performed on the task FPU buffer.
Ok, thanks.
quoted
In the past supervisor xstate features have used get_xsave_addr()
directly, and performed open coded logic handle operating on the
saved
state correctly. This has posed two problems:
 1. It has logic that has been gotten wrong more than once.
 2. To reduce code, less common path's are not optimized.
Determination
			   "paths" ^
Arg, thanks.
quoted
xstate = start_update_xsave_msrs(XFEATURE_FOO);
r = xsave_rdmsrl(state, MSR_IA32_FOO_1, &val)
if (r)
	xsave_wrmsrl(state, MSR_IA32_FOO_2, FOO_ENABLE);
end_update_xsave_msrs();
This looks OK.  I'm not thrilled about it.  The
start_update_xsave_msrs() can probably drop the "_msrs".  Maybe:

	start_xfeature_update(...);
Hmm, this whole thing pretends to be updating MSRs, which is often not
true. Maybe the xsave_rdmsrl/xsave_wrmsrl should be renamed too.
xsave_readl()/xsave_writel() or something.
Also, if you have to do the address lookup in xsave_rdmsrl() anyway,
I
wonder if the 'xstate' should just be a full fledged 'struct
xregs_state'.

The other option would be to make a little on-stack structure like:

	struct xsave_update {
		int feature;
		struct xregs_state *xregs;
	};

Then you do:

	struct xsave_update xsu;
	...
	start_update_xsave_msrs(&xsu, XFEATURE_FOO);

and then pass it along to each of the other operations:

	r = xsave_rdmsrl(xsu, MSR_IA32_FOO_1, &val)

It's slightly less likely to get type confused as a 'void *';
The 'void *' is actually a pointer to the specific xfeature in the
buffer. So the read/writes don't have to re-compute the offset every
time. It's not too much work though. I'm really surprised by the desire
to obfuscate the pointer, but I guess if we really want to, I'd rather
do that and keep the regular read/write operations.

If we don't care about the extra lookups this can totally drop the
caller side state. The feature nr can be looked up from the MSR along
with the struct offset. Then it doesn't expose the pointer to the
buffer, since it's all recomputed every operation.

So like:
start_xfeature_update();
r = xsave_readl(MSR_IA32_FOO_1, &val)
if (r)
        xsave_writel(MSR_IA32_FOO_2, FOO_ENABLE);
end_xfeature_update();

The WARNs then happen in the read/writes. An early iteration looked
like that. I liked this version with caller side state, but thought it
might be worth revisiting if there really is a strong desire to hide
the pointer.
quoted
+static u64 *__get_xsave_member(void *xstate, u32 msr)
+{
+	switch (msr) {
+	/* Currently there are no MSR's supported */
+	default:
+		WARN_ONCE(1, "x86/fpu: unsupported xstate msr (%u)\n",
msr);
+		return NULL;
+	}
+}
Just to get an idea what this is doing, it's OK to include the shadow
stack MSRs in here.
Ok.
Are you sure this should return a u64*?  We have lots of <=64-bit
XSAVE
fields.
I thought it should only be used with 64 bit msrs. Maybe it needs a
better name?
quoted
+/*
+ * Return a pointer to the xstate for the feature if it should be
used, or NULL
+ * if the MSRs should be written to directly. To do this safely,
using the
+ * associated read/write helpers is required.
+ */
+void *start_update_xsave_msrs(int xfeature_nr)
+{
+	void *xstate;
+
+	/*
+	 * fpregs_lock() only disables preemption (mostly). So modifing
state
							 modifying ^
	
quoted
+	 * in an interrupt could screw up some in progress fpregs
operation,
						^ in-progress
I swear I ran checkpatch...
quoted
+	 * but appear to work. Warn about it.
+	 */
+	WARN_ON_ONCE(!in_task());
+	WARN_ON_ONCE(current->flags & PF_KTHREAD);
This might also be a good spot to check that xfeature_nr is in
fpstate.xfeatures.
Hmm, good idea.
quoted
+	fpregs_lock();
+
+	fpregs_assert_state_consistent();
+
+	/*
+	 * If the registers don't need to be reloaded. Go ahead and
operate on the
+	 * registers.
+	 */
+	if (!test_thread_flag(TIF_NEED_FPU_LOAD))
+		return NULL;
+
+	xstate = get_xsave_addr(&current->thread.fpu.fpstate-
quoted
regs.xsave, xfeature_nr);
+
+	/*
+	 * If regs are in the init state, they can't be retrieved from
+	 * init_fpstate due to the init optimization, but are not
nessarily
							necessarily ^
Oof, thanks.
Spell checker time.  ":set spell" in vim works for me nicely.
quoted
+	 * zero. The only option is to restore to make everything live
and
+	 * operate on registers. This will clear TIF_NEED_FPU_LOAD.
+	 *
+	 * Otherwise, if not in the init state but TIF_NEED_FPU_LOAD is
set,
+	 * operate on the buffer. The registers will be restored before
going
+	 * to userspace in any case, but the task might get preempted
before
+	 * then, so this possibly saves an xsave.
+	 */
+	if (!xstate)
+		fpregs_restore_userregs();
Won't fpregs_restore_userregs() end up setting TIF_NEED_FPU_LOAD=0?
Isn't that a case where a "return NULL" is needed?
This is for the case when the feature is in the init state. For CET's 
case this could just zero the buffer and return the pointer to it, but
for other features the init state wasn't always zero. So this just
makes all the features "active" and TIF_NEED_FPU_LOAD is cleared. It
then returns NULL and the read/writes go to the MSRs. It still looks
correct to me, am I missing something?
In any case, this makes me think this code should start out stupid
and
slow.  Keep the API as-is, but make the first patch unconditionally
do
the WRMSR.  Leave the "fast" buffer modifications for a follow-on
patch.
Ok. Should I drop the optimized versions from the series or just split
them out? The optimizations were trying to address Boris' comments:
https://lore.kernel.org/lkml/YS95VzrNhDhFpsop@zn.tnic/ (local)
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help