Re: [RFC][PATCH 6/7] context_tracking: Provide SMP ordering using RCU
From: "Paul E. McKenney" <paulmck@kernel.org>
Date: 2021-09-22 15:17:24
Also in:
lkml
On Wed, Sep 22, 2021 at 01:05:12PM +0200, Peter Zijlstra wrote:
Use rcu_user_{enter,exit}() calls to provide SMP ordering on context
tracking state stores:
__context_tracking_exit()
__this_cpu_write(context_tracking.state, CONTEXT_KERNEL)
rcu_user_exit()
rcu_eqs_exit()
rcu_dynticks_eqs_eit()
rcu_dynticks_inc()
atomic_add_return() /* smp_mb */
__context_tracking_enter()
rcu_user_enter()
rcu_eqs_enter()
rcu_dynticks_eqs_enter()
rcu_dynticks_inc()
atomic_add_return() /* smp_mb */
__this_cpu_write(context_tracking.state, state)
This separates USER/KERNEL state with an smp_mb() on each side,
therefore, a user of context_tracking_state_cpu() can say the CPU must
pass through an smp_mb() before changing.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>For the transformation to negative errno return value and name change from an RCU perspective: Acked-by: Paul E. McKenney <paulmck@kernel.org> For the sampling of nohz_full userspace state: Another approach is for the rcu_data structure's ->dynticks variable to use the lower two bits to differentiate between idle, nohz_full userspace and kernel. In theory, inlining should make this zero cost for idle transition, and should allow you to safely sample nohz_full userspace state with a load and a couple of memory barriers instead of an IPI. To make this work nicely, the low-order bits have to be 00 for kernel, and (say) 01 for idle and 10 for nohz_full userspace. 11 would be an error. The trick would be for rcu_user_enter() and rcu_user_exit() to atomically increment ->dynticks by 2, for rcu_nmi_exit() to increment by 1 and rcu_nmi_enter() to increment by 3. The state sampling would need to change accordingly. Does this make sense, or am I missing something? Thanx, Paul
quoted hunk ↗ jump to hunk
--- include/linux/context_tracking_state.h | 12 ++++++++++++ kernel/context_tracking.c | 7 ++++--- 2 files changed, 16 insertions(+), 3 deletions(-)--- a/include/linux/context_tracking_state.h +++ b/include/linux/context_tracking_state.h@@ -45,11 +45,23 @@ static __always_inline bool context_trac { return __this_cpu_read(context_tracking.state) == CONTEXT_USER; } + +static __always_inline bool context_tracking_state_cpu(int cpu) +{ + struct context_tracking *ct = per_cpu_ptr(&context_tracking); + + if (!context_tracking_enabled() || !ct->active) + return CONTEXT_DISABLED; + + return ct->state; +} + #else static inline bool context_tracking_in_user(void) { return false; } static inline bool context_tracking_enabled(void) { return false; } static inline bool context_tracking_enabled_cpu(int cpu) { return false; } static inline bool context_tracking_enabled_this_cpu(void) { return false; } +static inline bool context_tracking_state_cpu(int cpu) { return CONTEXT_DISABLED; } #endif /* CONFIG_CONTEXT_TRACKING */ #endif --- a/kernel/context_tracking.c +++ b/kernel/context_tracking.c@@ -82,7 +82,7 @@ void noinstr __context_tracking_enter(en vtime_user_enter(current); instrumentation_end(); } - rcu_user_enter(); + rcu_user_enter(); /* smp_mb */ } /* * Even if context tracking is disabled on this CPU, because it's outside@@ -149,12 +149,14 @@ void noinstr __context_tracking_exit(enu return; if (__this_cpu_read(context_tracking.state) == state) { + __this_cpu_write(context_tracking.state, CONTEXT_KERNEL); + if (__this_cpu_read(context_tracking.active)) { /* * We are going to run code that may use RCU. Inform * RCU core about that (ie: we may need the tick again). */ - rcu_user_exit(); + rcu_user_exit(); /* smp_mb */ if (state == CONTEXT_USER) { instrumentation_begin(); vtime_user_exit(current);@@ -162,7 +164,6 @@ void noinstr __context_tracking_exit(enu instrumentation_end(); } } - __this_cpu_write(context_tracking.state, CONTEXT_KERNEL); } context_tracking_recursion_exit(); }