Re: [PATCH v13 06/10] iommu: Add a page fault handler
From: Jean-Philippe Brucker <hidden>
Date: 2021-03-23 10:51:33
Also in:
linux-acpi, linux-devicetree, linux-iommu
On Tue, Mar 02, 2021 at 03:59:57PM -0800, Jacob Pan wrote:
Hi Jean-Philippe, A few comments from the p.o.v of converting VT-d to this framework. Mostly about potential optimization. I think VT-d SVA code will be able to use this work. +Ashok provided many insight. FWIW, Reviewed-by:Jacob Pan [off-list ref]
Thanks!
On Tue, 2 Mar 2021 10:26:42 +0100, Jean-Philippe Brucker [off-list ref] wrote:quoted
+int iommu_queue_iopf(struct iommu_fault *fault, void *cookie) +{ + int ret; + struct iopf_group *group; + struct iopf_fault *iopf, *next; + struct iopf_device_param *iopf_param; + + struct device *dev = cookie; + struct dev_iommu *param = dev->iommu; + + lockdep_assert_held(¶m->lock); + + if (fault->type != IOMMU_FAULT_PAGE_REQ) + /* Not a recoverable page fault */ + return -EOPNOTSUPP; + + /* + * As long as we're holding param->lock, the queue can't be unlinked + * from the device and therefore cannot disappear. + */ + iopf_param = param->iopf_param; + if (!iopf_param) + return -ENODEV; + + if (!(fault->prm.flags & IOMMU_FAULT_PAGE_REQUEST_LAST_PAGE)) { + iopf = kzalloc(sizeof(*iopf), GFP_KERNEL); + if (!iopf) + return -ENOMEM; + + iopf->fault = *fault; + + /* Non-last request of a group. Postpone until the last one */Would be nice to have an option here to allow non-deferred handle_mm_fault. Some devices may have a large group. FYI, VT-d also needs to send page response before the last one (LPIG). "A Page Group Response Descriptor is issued by software in response to a page request with data or a page request (with or without data) that indicated that it was the last request in a group." But I think we deal with that when we convert. Perhaps just treat the request with data as LPIG.
Sure that seems fine. Do you mean that the vt-d driver will set the IOMMU_FAULT_PAGE_REQUEST_LAST_PAGE flag for PR-with-data? Otherwise we could introduce a new flag. I prefer making it explicit rather than having IOPF consumers infer that a response is needed when seeing IOMMU_FAULT_PAGE_REQUEST_PRIV_DATA set, because private_data wouldn't be reusable by other architectures.
Also adding a trace event would be nice, similar to CPU page fault.
Agreed, the tracepoints in my development tree (along with the lower-level page_request/response tracepoints) have been indispensable for debugging hardware and SVA applications
quoted
+ list_add(&iopf->list, &iopf_param->partial); + + return 0; + } + + group = kzalloc(sizeof(*group), GFP_KERNEL); + if (!group) { + /* + * The caller will send a response to the hardware. But we do + * need to clean up before leaving, otherwise partial faults + * will be stuck. + */ + ret = -ENOMEM; + goto cleanup_partial; + } + + group->dev = dev; + group->last_fault.fault = *fault; + INIT_LIST_HEAD(&group->faults); + list_add(&group->last_fault.list, &group->faults); + INIT_WORK(&group->work, iopf_handle_group); + + /* See if we have partial faults for this group */ + list_for_each_entry_safe(iopf, next, &iopf_param->partial, list) { + if (iopf->fault.prm.grpid == fault->prm.grpid)Just curious, the iopf handler is registered per arm_smmu_master dev. Is the namespace of group ID also within an arm_smmu_master?
Yes for both PCIe PRI and SMMU stall, the namespace is within one device (one RequesterID or StreamID)
Can one arm_smmu_master support multiple devices?
No, it's a single device
For VT-d, group ID is per PCI device.quoted
+ /* Insert *before* the last fault */ + list_move(&iopf->list, &group->faults); + } +This is fine with devices supports small number of outstanding PRQs. VT-d is setting the limit to 32. But the incoming DSA device will support 512. So if we pre-sort IOPF by group ID and put them in a per group list, would it be faster? I mean once the LPIG comes in, we just need to search the list of groups instead of all partial faults. I am not against what is done here, just exploring optimization.
Yes I think that's worth exploring, keeping the groups in a rb_tree or something like that, for easy access and update. Note that I don't have a system with non-LPIG faults, so I can't test any of this at the moment Thanks, Jean _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel