Re: [RFC PATCH 27/36] arm_mpam: Allow configuration to be applied and restored during cpu online
From: Dave Martin <Dave.Martin@arm.com>
Date: 2025-07-28 15:34:36
Also in:
lkml
Hi, On Mon, Jul 28, 2025 at 12:59:12PM +0100, Ben Horgan wrote:
Hi James, On 7/11/25 19:36, James Morse wrote:quoted
When CPUs come online the original configuration should be restored. Once the maximum partid is known, allocate an configuration array for each component, and reprogram each RIS configuration from this. The MPAM spec describes how multiple controls can interact. To prevent this happening by accident, always reset controls that don't have a valid configuration. This allows the same helper to be used for configuration and reset. CC: Dave Martin <Dave.Martin@arm.com> Signed-off-by: James Morse <james.morse@arm.com> --- drivers/platform/arm64/mpam/mpam_devices.c | 236 ++++++++++++++++++-- drivers/platform/arm64/mpam/mpam_internal.h | 26 ++- 2 files changed, 234 insertions(+), 28 deletions(-)diff --git a/drivers/platform/arm64/mpam/mpam_devices.c b/drivers/platform/arm64/mpam/mpam_devices.c index bb3695eb84e9..f3ecfda265d2 100644 --- a/drivers/platform/arm64/mpam/mpam_devices.c +++ b/drivers/platform/arm64/mpam/mpam_devices.c
[...]
quoted
@@ -1000,10 +1041,38 @@ static void mpam_reset_msc(struct mpam_msc *msc, bool online)
[...]
quoted
+static void mpam_reprogram_msc(struct mpam_msc *msc) +{ + int idx; + u16 partid; + bool reset; + struct mpam_config *cfg; + struct mpam_msc_ris *ris; + + idx = srcu_read_lock(&mpam_srcu); + list_for_each_entry_rcu(ris, &msc->ris, msc_list) { + if (!mpam_is_enabled() && !ris->in_reset_state) { + mpam_touch_msc(msc, &mpam_reset_ris, ris); + ris->in_reset_state = true; + continue; + } + + reset = true; + for (partid = 0; partid <= mpam_partid_max; partid++) {
Do we need to consider 'partid_max_lock' here?
Just throwing in my 2¢, since I'd dug into this a bit previously: Here, we are resetting an MSC or re-onlining a CPU. Either way, I think that this only happens after the initial probing phase is complete. mpam_enable_once() is ordered with respect to the task that did the final unlock of partid_max_lock during probing, by means of the schedule_work() call. (See <linux/workqueue.h>.) Taking the hotplug lock and installing mpam_cpu_online() for CPU hotplug probably brings a sufficient guarantee also (though I've not dug into it). This function doesn't seem to be called during the probing phase (via mpam_discovery_cpu_online()), so there shouldn't be any racing updates to the global variables here.
quoted
+ cfg = &ris->vmsc->comp->cfg[partid]; + if (cfg->features) + reset = false; + + mpam_reprogram_ris_partid(ris, partid, cfg); + } + ris->in_reset_state = reset; + } + srcu_read_unlock(&mpam_srcu, idx); +}
[...]
quoted
@@ -1806,6 +1875,43 @@ static void mpam_unregister_irqs(void)
[...]
quoted
+static int __allocate_component_cfg(struct mpam_component *comp) +{ + if (comp->cfg) + return 0; + + comp->cfg = kcalloc(mpam_partid_max + 1, sizeof(*comp->cfg), GFP_KERNEL);
And here?
Similarly, this runs only in the mpam_enable_once() call. [...]
quoted
@@ -1861,6 +1976,8 @@ static void mpam_reset_component_locked(struct mpam_component *comp) might_sleep(); lockdep_assert_cpus_held(); + memset(comp->cfg, 0, (mpam_partid_max * sizeof(*comp->cfg)));
And here?
Similarly to mpam_reset_msc(), I think this probably only runs from mpam_enable_once() or mpam_cpu_online(). I think most or all of the existing reads of the affected globals from within mpam_resctrl.c are also callbacks from resctrl_init(), which again exceutes during mpam_enable_once() (though I won't promise I haven't missed one or two). Once resctrl has fired up, I believe that the MPAM driver basically trusts the IDs coming in from resctrl, and doesn't need to range-check them against the global parameters again. [...]
Thanks, Ben
I consciously haven't done all the homework on this. Although it may look like the globals are read all over the place after probing, I think this actually only happens during resctrl initialision (which is basically single-threaded). The only place where they are read after probing and without mediation via resctrl is on the CPU hotplug path. Adding locking would ensure that an unstable value is never read, but this is not sufficient by itself to sure that the _final_ value of a variable is read (for some definition of "final"). And, if there is a well-defined notion of final value and there is sufficient synchronisation to ensure that this is the value read by a particular read, then by construction an unstable value cannot be read. I think that this kind of pattern is not that uncommon in the kernel, though it is a bit painful to reason about. Cheers ---Dave