Re: [PATCH 04/19] iommu/arm-smmu-v3: Make STE programming independent of the callers
From: Jason Gunthorpe <jgg@nvidia.com>
Date: 2023-10-20 11:39:52
Also in:
linux-iommu
On Fri, Oct 20, 2023 at 04:23:44PM +0800, Michael Shavit wrote:
The comment helps a lot thank you. I do still have some final reservations: wouldn't it be clearer with the loop un-rolled? After all it's only 3 steps in the worst case.... Something like:
I thought about that, but a big point for me was to consolidate the algorithm between CD/STE. Inlining everything makes it much more difficult to achieve this. Actually my first sketches were trying to write it unrolled.
+ arm_smmu_get_ste_used(target, &target_used);
+ arm_smmu_get_ste_used(cur, &cur_used);
+ if (!hitless_possible(target, target_used, cur_used, cur_used)) {hitless possible requires the loop of the step function to calcuate it.
+ target->data[0] = STRTAB_STE_0_V; + arm_smmu_sync_ste_for_sid(smmu, sid);
I still like V=0 as I think we do want the event for this case.
+ /*
+ * The STE is now in abort where none of the bits except
+ * STRTAB_STE_0_V and STRTAB_STE_0_CFG are accessed. This allows
+ * all other words of the STE to be written without further
+ * disruption.
+ */
+ arm_smmu_get_ste_used(cur, &cur_used);
+ }
+ /* write bits in all positions unused by the STE */
+ for (i = 0; i != ARRAY_SIZE(cur->data); i++) {
+ /* (should probably optimize this away if no write needed) */
+ WRITE_ONCE(cur->data[i], (cur->data[i] & cur_used[i])
| (target->data[i] & ~cur_used[i]));
+ }
+ arm_smmu_sync_ste_for_sid(smmu, sid);Yes, I wanted to avoid all the syncs if they are not required.
+ /*
+ * It should now be possible to make a single qword write to make the
+ * the new configuration take effect.
+ */
+ for (i = 0; i != ARRAY_SIZE(cur->data); i++) {
+ if ((cur->data[i] & target_used[i]) !=
(target->data[i] & target_used[i]))
+ /*
+ * TODO:
+ * WARN_ONCE if this condition hits more than once in
+ * the loop
+ */
+ WRITE_ONCE(cur->data[i], (cur->data[i] &
cur_used[i]) | (target->data[i] & ~cur_used[i]));
+ }+ arm_smmu_sync_ste_for_sid(smmu, sid);
This needs to be optional too
And there is another optional 4th pass to set the unused target values
to 0.
Basically you have captured the core algorithm, but I think if you
fill in all the missing bits to get up to the same functionality it
will be longer and unsharable with the CD side.
You could perhaps take this approach and split it into 4 sharable step
functions:
if (step1(target, target_used, cur_used, cur_used, len)) {
arm_smmu_sync_ste_for_sid(smmu, sid);
arm_smmu_get_ste_used(cur, &cur_used);
}
if (step2(target, target_used, cur_used, cur_used, len))
arm_smmu_sync_ste_for_sid(smmu, sid);
if (step3(target, target_used, cur_used, cur_used, len)) {
arm_smmu_sync_ste_for_sid(smmu, sid);
arm_smmu_get_ste_used(cur, &cur_used);
}
if (step4(target, target_used, cur_used, cur_used, len))
arm_smmu_sync_ste_for_sid(smmu, sid);
To me this is inelegant as if we only need to do step 3 we have to
redundantly scan the array 2 times. The rolled up version just
directly goes to step 3.
However this does convince me you've thought very carefully about this
and have not found a flaw in the design!
Thanks,
Jason
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel