Re: [PATCH] iommu/arm-smmu-v3: Add SMMUv3.2 range invalidation support
From: Rob Herring <robh@kernel.org>
Date: 2020-01-16 17:07:09
Also in:
linux-iommu
On Wed, Jan 15, 2020 at 10:33 AM Auger Eric [off-list ref] wrote:
Hi Rob, On 1/15/20 3:02 PM, Rob Herring wrote:quoted
On Wed, Jan 15, 2020 at 3:21 AM Auger Eric [off-list ref] wrote:quoted
Hi Rob, On 1/13/20 3:39 PM, Rob Herring wrote:quoted
Arm SMMUv3.2 adds support for TLB range invalidate operations. Support for range invalidate is determined by the RIL bit in the IDR3 register. The range invalidate is in units of the leaf page size and operates on 1-32 chunks of a power of 2 multiple pages. First we determine from the size what power of 2 multiple we can use and then adjust the granule to 32x that size.
quoted
quoted
quoted
@@ -2022,12 +2043,39 @@ static void arm_smmu_tlb_inv_range(unsigned long iova, size_t size, cmd.tlbi.vmid = smmu_domain->s2_cfg.vmid; } + if (smmu->features & ARM_SMMU_FEAT_RANGE_INV) { + unsigned long tg, scale; + + /* Get the leaf page size */ + tg = __ffs(smmu_domain->domain.pgsize_bitmap);it is unclear to me why you can't set tg with the granule parameter.granule could be 2MB sections if THP is enabled, right?Ah OK I thought it was a page size and not a block size. I requested this feature a long time ago for virtual SMMUv3. With DPDK/VFIO the guest was sending page TLB invalidation for each page (granule=4K or 64K) part of the hugepage buffer and those were trapped by the VMM. This stalled qemu.
I did some more testing to make sure THP is enabled, but haven't been able to get granule to be anything but 4K. I only have the Fast Model with AHCI on PCI to test this with. Maybe I'm hitting some place where THPs aren't supported yet.
quoted
quoted
quoted
+ /* Determine the power of 2 multiple number of pages */ + scale = __ffs(size / (1UL << tg)); + cmd.tlbi.scale = scale; + + cmd.tlbi.num = CMDQ_TLBI_RANGE_NUM_MAX - 1;Also could you explain why you use CMDQ_TLBI_RANGE_NUM_MAX.How's this: /* The invalidation loop defaults to the maximum range */I would have expected num=0 directly. Don't we invalidate the &size in one shot as 2^scale * pages of granularity @tg? I fail to understand when NUM > 0.
NUM is > 0 anytime size is not a power of 2. For example, if size is 33 pages, then it takes 2 loops doing 32 pages and then 1 page. If size is 34 pages, then NUM is (17-1) and SCALE is 1. Rob _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel