Thread (28 messages) 28 messages, 8 authors, 2021-04-01

Re: [PATCH v13 06/10] iommu: Add a page fault handler

From: Jean-Philippe Brucker <hidden>
Date: 2021-03-23 10:51:33
Also in: linux-acpi, linux-devicetree, linux-iommu

On Tue, Mar 02, 2021 at 03:59:57PM -0800, Jacob Pan wrote:
Hi Jean-Philippe,

A few comments from the p.o.v of converting VT-d to this framework. Mostly
about potential optimization. I think VT-d SVA code will be able to use this
work.
+Ashok provided many insight.

FWIW,
Reviewed-by:Jacob Pan [off-list ref]
Thanks!
On Tue,  2 Mar 2021 10:26:42 +0100, Jean-Philippe Brucker
[off-list ref] wrote:
quoted
+int iommu_queue_iopf(struct iommu_fault *fault, void *cookie)
+{
+	int ret;
+	struct iopf_group *group;
+	struct iopf_fault *iopf, *next;
+	struct iopf_device_param *iopf_param;
+
+	struct device *dev = cookie;
+	struct dev_iommu *param = dev->iommu;
+
+	lockdep_assert_held(&param->lock);
+
+	if (fault->type != IOMMU_FAULT_PAGE_REQ)
+		/* Not a recoverable page fault */
+		return -EOPNOTSUPP;
+
+	/*
+	 * As long as we're holding param->lock, the queue can't be
unlinked
+	 * from the device and therefore cannot disappear.
+	 */
+	iopf_param = param->iopf_param;
+	if (!iopf_param)
+		return -ENODEV;
+
+	if (!(fault->prm.flags & IOMMU_FAULT_PAGE_REQUEST_LAST_PAGE)) {
+		iopf = kzalloc(sizeof(*iopf), GFP_KERNEL);
+		if (!iopf)
+			return -ENOMEM;
+
+		iopf->fault = *fault;
+
+		/* Non-last request of a group. Postpone until the last
one */
Would be nice to have an option here to allow non-deferred handle_mm_fault.
Some devices may have a large group.

FYI, VT-d also needs to send page response before the last one (LPIG).
"A Page Group Response Descriptor is issued by software in response to a
page request with data or a page request (with or without data) that
indicated that it was the last request in a group."

But I think we deal with that when we convert. Perhaps just treat the
request with data as LPIG.
Sure that seems fine. Do you mean that the vt-d driver will set the
IOMMU_FAULT_PAGE_REQUEST_LAST_PAGE flag for PR-with-data?  Otherwise we
could introduce a new flag. I prefer making it explicit rather than having
IOPF consumers infer that a response is needed when seeing
IOMMU_FAULT_PAGE_REQUEST_PRIV_DATA set, because private_data wouldn't be
reusable by other architectures.
Also adding a trace event would be nice, similar to CPU page fault.
Agreed, the tracepoints in my development tree (along with the lower-level
page_request/response tracepoints) have been indispensable for debugging
hardware and SVA applications
quoted
+		list_add(&iopf->list, &iopf_param->partial);
+
+		return 0;
+	}
+
+	group = kzalloc(sizeof(*group), GFP_KERNEL);
+	if (!group) {
+		/*
+		 * The caller will send a response to the hardware. But
we do
+		 * need to clean up before leaving, otherwise partial
faults
+		 * will be stuck.
+		 */
+		ret = -ENOMEM;
+		goto cleanup_partial;
+	}
+
+	group->dev = dev;
+	group->last_fault.fault = *fault;
+	INIT_LIST_HEAD(&group->faults);
+	list_add(&group->last_fault.list, &group->faults);
+	INIT_WORK(&group->work, iopf_handle_group);
+
+	/* See if we have partial faults for this group */
+	list_for_each_entry_safe(iopf, next, &iopf_param->partial, list)
{
+		if (iopf->fault.prm.grpid == fault->prm.grpid)
Just curious, the iopf handler is registered per arm_smmu_master dev. Is
the namespace of group ID also within an arm_smmu_master?
Yes for both PCIe PRI and SMMU stall, the namespace is within one device
(one RequesterID or StreamID)
Can one arm_smmu_master support multiple devices?
No, it's a single device
For VT-d, group ID is per PCI device.
quoted
+			/* Insert *before* the last fault */
+			list_move(&iopf->list, &group->faults);
+	}
+
This is fine with devices supports small number of outstanding PRQs.
VT-d is setting the limit to 32. But the incoming DSA device will support
512.

So if we pre-sort IOPF by group ID and put them in a per group list, would
it be faster? I mean once the LPIG comes in, we just need to search the
list of groups instead of all partial faults. I am not against what is done
here, just exploring optimization.
Yes I think that's worth exploring, keeping the groups in a rb_tree or something
like that, for easy access and update. Note that I don't have a system
with non-LPIG faults, so I can't test any of this at the moment

Thanks,
Jean

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help