Thread (19 messages) 19 messages, 4 authors, 2020-08-25

RE: [PATCH v8 5/5] remoteproc: Add initial zynqmp R5 remoteproc driver

From: Ben Levinsky <hidden>
Date: 2020-08-24 21:20:32
Also in: linux-devicetree, linux-remoteproc, lkml

Hi Stefano 

This is just response to few unanswered comments

Thanks
Ben
-----Original Message-----
From: linux-remoteproc-owner@vger.kernel.org <linux-remoteproc-
owner@vger.kernel.org> On Behalf Of Ben Levinsky
Sent: Thursday, August 20, 2020 8:13 AM
To: Stefano Stabellini <redacted>
Cc: Michal Simek <redacted>; devicetree@vger.kernel.org;
mathieu.poirier@linaro.org; Ed T. Mooring [off-list ref]; linux-
remoteproc@vger.kernel.org; linux-kernel@vger.kernel.org;
robh+dt@kernel.org; Jiaying Liang [off-list ref]; linux-arm-
kernel@lists.infradead.org
Subject: RE: [PATCH v8 5/5] remoteproc: Add initial zynqmp R5 remoteproc
driver

quoted
-----Original Message-----
From: Stefano Stabellini <redacted>
Sent: Wednesday, August 19, 2020 2:21 PM
To: Ben Levinsky <redacted>
Cc: Stefano Stabellini <redacted>; Michal Simek
[off-list ref]; devicetree@vger.kernel.org;
mathieu.poirier@linaro.org; Ed T. Mooring [off-list ref]; linux-
remoteproc@vger.kernel.org; linux-kernel@vger.kernel.org;
robh+dt@kernel.org; Jiaying Liang [off-list ref]; linux-arm-
kernel@lists.infradead.org
Subject: RE: [PATCH v8 5/5] remoteproc: Add initial zynqmp R5 remoteproc
driver

On Tue, 18 Aug 2020, Ben Levinsky wrote:
quoted
quoted
quoted
+/**
+ * struct zynqmp_r5_mem - zynqmp rpu memory data
+ * @pnode_id: TCM power domain ids
+ * @res: memory resource
+ * @node: list node
+ */
+struct zynqmp_r5_mem {
+	u32 pnode_id[MAX_MEM_PNODES];
+	struct resource res;
+	struct list_head node;
+};
+
+/**
+ * struct zynqmp_r5_pdata - zynqmp rpu remote processor private
data
quoted
quoted
quoted
quoted
+ * @dev: device of RPU instance
+ * @rproc: rproc handle
+ * @pnode_id: RPU CPU power domain id
+ * @mems: memory resources
+ * @is_r5_mode_set: indicate if r5 operation mode is set
+ * @tx_mc: tx mailbox client
+ * @rx_mc: rx mailbox client
+ * @tx_chan: tx mailbox channel
+ * @rx_chan: rx mailbox channel
+ * @mbox_work: mbox_work for the RPU remoteproc
+ * @tx_mc_skbs: socket buffers for tx mailbox client
+ * @rx_mc_buf: rx mailbox client buffer to save the rx message
+ */
+struct zynqmp_r5_pdata {
+	struct device dev;
+	struct rproc *rproc;
+	u32 pnode_id;
+	struct list_head mems;
+	bool is_r5_mode_set;
+	struct mbox_client tx_mc;
+	struct mbox_client rx_mc;
+	struct mbox_chan *tx_chan;
+	struct mbox_chan *rx_chan;
+	struct work_struct mbox_work;
+	struct sk_buff_head tx_mc_skbs;
+	unsigned char rx_mc_buf[RX_MBOX_CLIENT_BUF_MAX];
A simple small reordering of the struct fields would lead to small
memory savings due to alignment.
[Ben Levinsky] will do. Do you mean ordering in either ascending or
descending order?

Each field has a different alignment in the struct, so for example after
pnode_id there are probably 4 unused bytes because mems is 64bit
aligned.
[Ben Levinsky] ok will update this so the alignments are done with less
unused memory per struct allocation.
quoted
quoted
quoted
quoted
+/*
+ * TCM needs mapping to R5 relative address and xilinx platform
mgmt
quoted
call
quoted
quoted
quoted
+ */
+struct rproc_mem_entry *handle_tcm_parsing(struct device *dev,
+					    struct reserved_mem
*rmem,
quoted
quoted
quoted
quoted
+					    struct device_node *node,
+					    int lookup_idx)
+{
+	void *va;
+	dma_addr_t dma;
+	resource_size_t size;
+	int ret;
+	u32 pnode_id;
+	struct resource rsc;
+	struct rproc_mem_entry *mem;
+
+	pnode_id =  tcm_addr_to_pnode[lookup_idx][1];
+	ret = zynqmp_pm_request_node(pnode_id,
+				     ZYNQMP_PM_CAPABILITY_ACCESS,
0,
quoted
quoted
quoted
quoted
+
ZYNQMP_PM_REQUEST_ACK_BLOCKING);
quoted
quoted
quoted
quoted
+	if (ret < 0) {
+		dev_err(dev, "failed to request power node: %u\n",
pnode_id);
quoted
+		return -EINVAL;
+	}
+
+	ret = of_address_to_resource(node, 0, &rsc);
+	if (ret < 0) {
+		dev_err(dev, "failed to get resource of memory %s",
+			of_node_full_name(node));
+		return -EINVAL;
+	}
+	size = resource_size(&rsc);
+	va = devm_ioremap_wc(dev, rsc.start, size);
+	if (!va)
+		return -ENOMEM;
+
+	/* zero out tcm base address */
+	if (rsc.start & 0xffe00000) {
+		/* R5 can't see anything past 0xfffff so wipe it */
+		rsc.start &= 0x000fffff;
If that is the case why not do:

  rsc.start &= 0x000fffff;

unconditionally? if (rsc.start & 0xffe00000) is superfluous.

But I thought that actually the R5s could see TCM at both the low
address (< 0x000fffff) and also at the high address (i.e. 0xffe00000).
[Ben Levinsky] Here yes can make rsc.start &= 0x000fffff undconditional.
Will
quoted
update in v9 as such
quoted
		Also, this logic is because this is only for the Xilinx R5
mappings of TCM banks that are at (from the TCM point of view) 0x0 and
0x2000

What I meant is that as far as I understand from the TRM the RPU should
also be able to access the same banks at the same address of the APU,
which I imagine is in the 0xffe00000 range. But I could be wrong on
this.

If we could use the same addresses for RPU and APU, it would simplify
this driver.
[Ben Levinsky] right for R5 it is slightly different as it is 32bit so the device address values need to be adjusted
quoted
quoted
quoted
quoted
+		/*
+		 * handle tcm banks 1 a and b (0xffe9000 and
+		 * 0xffeb0000)
+		 */
+		if (rsc.start & 0x80000)
+			rsc.start -= 0x90000;
It is very unclear to me why we have to do this
[Ben Levinsky] This is for TCM bank 0B and bank 1B to map them to
0x00020000 so that the TCM relative addressing lines up. For example
(0xffe90000 & 0x000fffff) - 0x90000 = 0x20000

Could you please explain the mapping in an in-code comment?
The comment currently mentions 0xffe9000 and 0xffeb0000 but:

- 0xffe9000 & 0x000fffff = 0xe9000
  0xe9000 - 0x90000 = 0x59000

- 0xffeb0000 & 0x000fffff = 0xeb000
  0xeb000 - 0x90000 = 0xeb000

Either way we don't get 0x20000. What am I missing?
[Ben Levinsky] I apologize there is typo in the comment... it should be
0xffe90000 and 0xffeb0000
The output is:
0xffe90000 & 0x000fffff = 0x90000
0x90000 - 0x90000 = 0x0

And
0xffeb0000 & 0x000fffff = 0xB0000
0xB0000 - 0x90000 = 0x20000

So these line up for the relative addressing for RPU's view of TCMs
quoted
quoted
quoted
quoted
+	}
+
+	dma = (dma_addr_t)rsc.start;
Given the way the dma parameter is used by
rproc_alloc_registered_carveouts, I think it might be best to pass the
original start address (i.e. 0xffe00000) as dma.

quoted
+	mem = rproc_mem_entry_init(dev, va, dma, (int)size, rsc.start,
+				   NULL, zynqmp_r5_mem_release,
I don't know too much about the remoteproc APIs, but shouldn't you be
passing zynqmp_r5_rproc_mem_alloc to it instead of NULL?
[Ben Levinsky] the difference is that for TCM we have to do make the
relative address work for TCM, so the dma input to rproc_mem_entry_init is
different in TCM case.

The dma address is the address as seen by the RPU, is that right?
So you are trying to set the dma address to something in the range 0 -
0x20000?
[Ben Levinsky] yes
quoted
quoted
quoted
quoted
+				   rsc.name);
+	if (!mem)
+		return -ENOMEM;
+
+	return mem;
+}
+
+static int parse_mem_regions(struct rproc *rproc)
+{
+	int num_mems, i;
+	struct zynqmp_r5_pdata *pdata = rproc->priv;
+	struct device *dev =  &pdata->dev;
+	struct device_node *np = dev->of_node;
+	struct rproc_mem_entry *mem;
+
+	num_mems = of_count_phandle_with_args(np, "memory-
region",
quoted
quoted
quoted
NULL);
quoted
+	if (num_mems <= 0)
+		return 0;
+	for (i = 0; i < num_mems; i++) {
+		struct device_node *node;
+		struct reserved_mem *rmem;
+
+		node = of_parse_phandle(np, "memory-region", i);
Check node != NULL ?
[Ben Levinsky] will add this in v9
quoted
quoted
+		rmem = of_reserved_mem_lookup(node);
+		if (!rmem) {
+			dev_err(dev, "unable to acquire memory-
region\n");
quoted
quoted
quoted
quoted
+			return -EINVAL;
+		}
+
+		if (strstr(node->name, "vdev0buffer")) {
vdev0buffer is not described in the device tree bindings, is that
normal/expected?
[Ben Levinsky] vdev0buffer is not required, as there might be simple
firmware loading with no IPC. Vdev0buffer only needed for IPC.

OK, good. It should probably still be described in the device tree
binding as optional property.

quoted
quoted
quoted
+			/* Register DMA region */
+			mem = rproc_mem_entry_init(dev, NULL,
+						   (dma_addr_t)rmem-
base,
quoted
quoted
quoted
+						   rmem->size, rmem-
base,
quoted
quoted
quoted
+						   NULL, NULL,
+						   "vdev0buffer");
+			if (!mem) {
+				dev_err(dev, "unable to initialize
memory-
quoted
quoted
quoted
region %s\n",
quoted
+					node->name);
+				return -ENOMEM;
+			}
+			dev_dbg(dev, "parsed %s at  %llx\r\n", mem-
name,
quoted
quoted
quoted
+				mem->dma);
+		} else if (strstr(node->name, "vdev0vring")) {
Same here

quoted
+			int vring_id;
+			char name[16];
+
+			/*
+			 * can be 1 of multiple vring IDs per IPC
channel
quoted
quoted
quoted
quoted
+			 * e.g. 'vdev0vring0' and 'vdev0vring1'
+			 */
+			vring_id = node->name[14] - '0';
Where does the "14" comes from? Are we sure it is not possible to have
a
quoted
quoted
quoted
node->name smaller than 14 chars?
[Ben Levinsky] Presently there are only 2 vrings used per Xilinx OpenAMP
channel to RPU. In Xilinx kernel we have hard-coded names as these are the
only nodes that use it. For example RPU0vdev0vring0 and RPU1vdev0vring0.
Hence we only check for vdev0vring and not a sscanf("%*s%i") to parse out
the vring ID or other, cleaner solution.

OK, but I think it is best if we use node->name[14] only if we
explicitly check for a string of at least 14 characters. Using strstr,
it could return the string "vdev0vring" which is less than 14 chars,
leading to a bug.

quoted
quoted
quoted
+			snprintf(name, sizeof(name), "vdev0vring%d",
vring_id);
quoted
+			/* Register vring */
+			mem = rproc_mem_entry_init(dev, NULL,
+						   (dma_addr_t)rmem-
base,
quoted
quoted
quoted
+						   rmem->size, rmem-
base,
quoted
quoted
quoted
+
zynqmp_r5_rproc_mem_alloc,
quoted
+
zynqmp_r5_rproc_mem_release,
quoted
+						   name);
+			dev_dbg(dev, "parsed %s at %llx\r\n", mem-
name,
quoted
quoted
quoted
+				mem->dma);
+		} else {
+			int idx;
+
+			/*
+			 * if TCM update address space for R5 and
+			 * make xilinx platform mgmt call
+			 */
+			for (idx = 0; idx <
ZYNQMP_R5_NUM_TCM_BANKS;
quoted
quoted
quoted
idx++) {
quoted
+				if (tcm_addr_to_pnode[idx][0] ==
rmem-
quoted
quoted
quoted
quoted
base)
+					break;
There is something I don't quite understand. If the memory region to
use
quoted
quoted
quoted
is TCM, why would it be also described under reserved-memory?
Reserved-memory is for normal memory being reserved, while TCM is
not
quoted
quoted
quoted
normal memory. Am I missing something?
[Ben Levinsky] I can change this in v9 as discussed. That is, have no TCM
under reserved mem. Instead have the banks as nodes in device tree with
status="[enabled|disabled]" and if enabled, then try to add memories in
parse_fw call.
quoted
quoted
quoted
+			}
+
+			if (idx != ZYNQMP_R5_NUM_TCM_BANKS) {
+				mem = handle_tcm_parsing(dev,
rmem, node,
quoted
quoted
quoted
idx);
quoted
+			} else {
+				mem = rproc_mem_entry_init(dev,
NULL,
quoted
quoted
quoted
quoted
+						   (dma_addr_t)rmem-
base,
quoted
quoted
quoted
+						   rmem->size, rmem-
base,
quoted
quoted
quoted
+
zynqmp_r5_rproc_mem_alloc,
quoted
+
zynqmp_r5_rproc_mem_release,
quoted
+						   node->name);
This case looks identical to the vdev0vring above. Is the difference
really just in the "name"? If so, can we merge the two cases into one?
no, because the devm_ioremap_wc call has to be done before changing
the dma address to relative for TCM banks.
quoted
quoted
[Ben Levinsky] ok will clean up to merge where able
quoted
quoted
quoted
quoted
+			}
+
+			if (!mem) {
+				dev_err(dev,
+					"unable to init memory-
region %s\n",
quoted
quoted
quoted
quoted
+					node->name);
+				return -ENOMEM;
+			}
+		}
+		rproc_add_carveout(rproc, mem);
+	}
+
+	return 0;
+}
+
+static int zynqmp_r5_parse_fw(struct rproc *rproc, const struct
firmware
quoted
quoted
*fw)
quoted
+{
+	int ret;
+	struct zynqmp_r5_pdata *pdata = rproc->priv;
+	struct device *dev = &pdata->dev;
+
+	ret = parse_mem_regions(rproc);
+	if (ret) {
+		dev_err(dev, "parse_mem_regions failed %x\n", ret);
+		return ret;
+	}
+
+	ret = rproc_elf_load_rsc_table(rproc, fw);
+	if (ret == -EINVAL) {
+		dev_info(dev, "no resource table found.\n");
+		ret = 0;
Why do we want to continue ignoring the error in this case?
[Ben Levinsky] as there can be simple firmware loaded onto R5 that do not
have resource table. Resource table only needed for specific IPC case.

OK, an in-code comment would be good

quoted
quoted
quoted
+	struct device *dev;
+	struct zynqmp_r5_mem *mem;
+	int ret;
+	struct property *prop;
+	const __be32 *cur;
+	u32 val;
+	int i;
+
+	dev = &pdata->dev;
+	mem = devm_kzalloc(dev, sizeof(*mem), GFP_KERNEL);
+	if (!mem)
+		return -ENOMEM;
+	ret = of_address_to_resource(node, 0, &mem->res);
+	if (ret < 0) {
+		dev_err(dev, "failed to get resource of memory %s",
+			of_node_full_name(node));
+		return -EINVAL;
+	}
+
+	/* Get the power domain id */
+	i = 0;
+	if (of_find_property(node, "pnode-id", NULL)) {
+		of_property_for_each_u32(node, "pnode-id", prop,
cur, val)
quoted
quoted
quoted
quoted
+			mem->pnode_id[i++] = val;
+	}
+	list_add_tail(&mem->node, &pdata->mems);
+	return 0;
+}
+
+/**
+ * zynqmp_r5_release() - ZynqMP R5 device release function
+ * @dev: pointer to the device struct of ZynqMP R5
+ *
+ * Function to release ZynqMP R5 device.
+ */
+static void zynqmp_r5_release(struct device *dev)
+{
+	struct zynqmp_r5_pdata *pdata;
+	struct rproc *rproc;
+	struct sk_buff *skb;
+
+	pdata = dev_get_drvdata(dev);
+	rproc = pdata->rproc;
+	if (rproc) {
+		rproc_del(rproc);
+		rproc_free(rproc);
+	}
+	if (pdata->tx_chan)
+		mbox_free_channel(pdata->tx_chan);
+	if (pdata->rx_chan)
+		mbox_free_channel(pdata->rx_chan);
+	/* Discard all SKBs */
+	while (!skb_queue_empty(&pdata->tx_mc_skbs)) {
+		skb = skb_dequeue(&pdata->tx_mc_skbs);
+		kfree_skb(skb);
+	}
+
+	put_device(dev->parent);
+}
+
+/**
+ * event_notified_idr_cb() - event notified idr callback
+ * @id: idr id
+ * @ptr: pointer to idr private data
+ * @data: data passed to idr_for_each callback
+ *
+ * Pass notification to remoteproc virtio
+ *
+ * Return: 0. having return is to satisfy the idr_for_each() function
+ *          pointer input argument requirement.
+ **/
+static int event_notified_idr_cb(int id, void *ptr, void *data)
+{
+	struct rproc *rproc = data;
+
+	(void)rproc_vq_interrupt(rproc, id);
+	return 0;
+}
+
+/**
+ * handle_event_notified() - remoteproc notification work funciton
+ * @work: pointer to the work structure
+ *
+ * It checks each registered remoteproc notify IDs.
+ */
+static void handle_event_notified(struct work_struct *work)
+{
+	struct rproc *rproc;
+	struct zynqmp_r5_pdata *pdata;
+
+	pdata = container_of(work, struct zynqmp_r5_pdata,
mbox_work);
quoted
quoted
quoted
quoted
+
+	(void)mbox_send_message(pdata->rx_chan, NULL);
+	rproc = pdata->rproc;
+	/*
+	 * We only use IPI for interrupt. The firmware side may or may
+	 * not write the notifyid when it trigger IPI.
+	 * And thus, we scan through all the registered notifyids.
+	 */
+	idr_for_each(&rproc->notifyids, event_notified_idr_cb, rproc);
This looks expensive. Should we at least check whether the notifyid was
written as first thing before doing the scan?
[Ben Levinsky] this will be at most 2 vrings presently per firmware-load
and
quoted
only done when the firmware is loaded so the latency so should not impact
performace or user

OK

quoted
quoted
quoted
+	/* Get R5 power domain node */
+	ret = of_property_read_u32(node, "pnode-id", &pdata-
pnode_id);
quoted
quoted
quoted
+	if (ret) {
+		dev_err(dev, "failed to get power node id.\n");
+		goto error;
+	}
+
+	/* TODO Check if R5 is running */
+
+	/* Set up R5 if not already setup */
+	ret = pdata->is_r5_mode_set ? 0 : r5_set_mode(pdata);
+	if (ret) {
+		dev_err(dev, "failed to set R5 operation mode.\n");
+		return ret;
+	}
is_r5_mode_set is set by r5_set_mode(), which is only called here.
So it looks like this check is important in cases where
zynqmp_r5_probe() is called twice for the same R5 node. But I don't
think that is supposed to happen?
[Ben Levinsky] this is needed as there are cases where user can repeatedly
load different firmware so the check is needed in cases like this where rpu is
already configured. It is also possible that a user might repeatedly
load/unload the module

OK

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help