RE: [PATCH v8 5/5] remoteproc: Add initial zynqmp R5 remoteproc driver
From: Ben Levinsky <hidden>
Date: 2020-08-24 21:20:32
Also in:
linux-devicetree, linux-remoteproc, lkml
Hi Stefano This is just response to few unanswered comments Thanks Ben
-----Original Message----- From: linux-remoteproc-owner@vger.kernel.org <linux-remoteproc- owner@vger.kernel.org> On Behalf Of Ben Levinsky Sent: Thursday, August 20, 2020 8:13 AM To: Stefano Stabellini <redacted> Cc: Michal Simek <redacted>; devicetree@vger.kernel.org; mathieu.poirier@linaro.org; Ed T. Mooring [off-list ref]; linux- remoteproc@vger.kernel.org; linux-kernel@vger.kernel.org; robh+dt@kernel.org; Jiaying Liang [off-list ref]; linux-arm- kernel@lists.infradead.org Subject: RE: [PATCH v8 5/5] remoteproc: Add initial zynqmp R5 remoteproc driverquoted
-----Original Message----- From: Stefano Stabellini <redacted> Sent: Wednesday, August 19, 2020 2:21 PM To: Ben Levinsky <redacted> Cc: Stefano Stabellini <redacted>; Michal Simek [off-list ref]; devicetree@vger.kernel.org; mathieu.poirier@linaro.org; Ed T. Mooring [off-list ref]; linux- remoteproc@vger.kernel.org; linux-kernel@vger.kernel.org; robh+dt@kernel.org; Jiaying Liang [off-list ref]; linux-arm- kernel@lists.infradead.org Subject: RE: [PATCH v8 5/5] remoteproc: Add initial zynqmp R5 remoteproc driver On Tue, 18 Aug 2020, Ben Levinsky wrote:quoted
quoted
quoted
+/** + * struct zynqmp_r5_mem - zynqmp rpu memory data + * @pnode_id: TCM power domain ids + * @res: memory resource + * @node: list node + */ +struct zynqmp_r5_mem { + u32 pnode_id[MAX_MEM_PNODES]; + struct resource res; + struct list_head node; +}; + +/** + * struct zynqmp_r5_pdata - zynqmp rpu remote processor privatedataquoted
quoted
quoted
quoted
+ * @dev: device of RPU instance + * @rproc: rproc handle + * @pnode_id: RPU CPU power domain id + * @mems: memory resources + * @is_r5_mode_set: indicate if r5 operation mode is set + * @tx_mc: tx mailbox client + * @rx_mc: rx mailbox client + * @tx_chan: tx mailbox channel + * @rx_chan: rx mailbox channel + * @mbox_work: mbox_work for the RPU remoteproc + * @tx_mc_skbs: socket buffers for tx mailbox client + * @rx_mc_buf: rx mailbox client buffer to save the rx message + */ +struct zynqmp_r5_pdata { + struct device dev; + struct rproc *rproc; + u32 pnode_id; + struct list_head mems; + bool is_r5_mode_set; + struct mbox_client tx_mc; + struct mbox_client rx_mc; + struct mbox_chan *tx_chan; + struct mbox_chan *rx_chan; + struct work_struct mbox_work; + struct sk_buff_head tx_mc_skbs; + unsigned char rx_mc_buf[RX_MBOX_CLIENT_BUF_MAX];A simple small reordering of the struct fields would lead to small memory savings due to alignment.[Ben Levinsky] will do. Do you mean ordering in either ascending ordescending order? Each field has a different alignment in the struct, so for example after pnode_id there are probably 4 unused bytes because mems is 64bit aligned.[Ben Levinsky] ok will update this so the alignments are done with less unused memory per struct allocation.quoted
quoted
quoted
quoted
+/* + * TCM needs mapping to R5 relative address and xilinx platformmgmtquoted
callquoted
quoted
quoted
+ */ +struct rproc_mem_entry *handle_tcm_parsing(struct device *dev, + struct reserved_mem*rmem,quoted
quoted
quoted
quoted
+ struct device_node *node, + int lookup_idx) +{ + void *va; + dma_addr_t dma; + resource_size_t size; + int ret; + u32 pnode_id; + struct resource rsc; + struct rproc_mem_entry *mem; + + pnode_id = tcm_addr_to_pnode[lookup_idx][1]; + ret = zynqmp_pm_request_node(pnode_id, + ZYNQMP_PM_CAPABILITY_ACCESS,0,quoted
quoted
quoted
quoted
+ZYNQMP_PM_REQUEST_ACK_BLOCKING);quoted
quoted
quoted
quoted
+ if (ret < 0) { + dev_err(dev, "failed to request power node: %u\n",pnode_id);quoted
+ return -EINVAL; + } + + ret = of_address_to_resource(node, 0, &rsc); + if (ret < 0) { + dev_err(dev, "failed to get resource of memory %s", + of_node_full_name(node)); + return -EINVAL; + } + size = resource_size(&rsc); + va = devm_ioremap_wc(dev, rsc.start, size); + if (!va) + return -ENOMEM; + + /* zero out tcm base address */ + if (rsc.start & 0xffe00000) { + /* R5 can't see anything past 0xfffff so wipe it */ + rsc.start &= 0x000fffff;If that is the case why not do: rsc.start &= 0x000fffff; unconditionally? if (rsc.start & 0xffe00000) is superfluous. But I thought that actually the R5s could see TCM at both the low address (< 0x000fffff) and also at the high address (i.e. 0xffe00000).[Ben Levinsky] Here yes can make rsc.start &= 0x000fffff undconditional.Willquoted
update in v9 as suchquoted
Also, this logic is because this is only for the Xilinx R5mappings of TCM banks that are at (from the TCM point of view) 0x0 and 0x2000 What I meant is that as far as I understand from the TRM the RPU should also be able to access the same banks at the same address of the APU, which I imagine is in the 0xffe00000 range. But I could be wrong on this. If we could use the same addresses for RPU and APU, it would simplify this driver.
[Ben Levinsky] right for R5 it is slightly different as it is 32bit so the device address values need to be adjusted
quoted
quoted
quoted
quoted
+ /* + * handle tcm banks 1 a and b (0xffe9000 and + * 0xffeb0000) + */ + if (rsc.start & 0x80000) + rsc.start -= 0x90000;It is very unclear to me why we have to do this[Ben Levinsky] This is for TCM bank 0B and bank 1B to map them to0x00020000 so that the TCM relative addressing lines up. For example (0xffe90000 & 0x000fffff) - 0x90000 = 0x20000 Could you please explain the mapping in an in-code comment? The comment currently mentions 0xffe9000 and 0xffeb0000 but: - 0xffe9000 & 0x000fffff = 0xe9000 0xe9000 - 0x90000 = 0x59000 - 0xffeb0000 & 0x000fffff = 0xeb000 0xeb000 - 0x90000 = 0xeb000 Either way we don't get 0x20000. What am I missing?[Ben Levinsky] I apologize there is typo in the comment... it should be 0xffe90000 and 0xffeb0000 The output is: 0xffe90000 & 0x000fffff = 0x90000 0x90000 - 0x90000 = 0x0 And 0xffeb0000 & 0x000fffff = 0xB0000 0xB0000 - 0x90000 = 0x20000 So these line up for the relative addressing for RPU's view of TCMsquoted
quoted
quoted
quoted
+ } + + dma = (dma_addr_t)rsc.start;Given the way the dma parameter is used by rproc_alloc_registered_carveouts, I think it might be best to pass the original start address (i.e. 0xffe00000) as dma.quoted
+ mem = rproc_mem_entry_init(dev, va, dma, (int)size, rsc.start, + NULL, zynqmp_r5_mem_release,I don't know too much about the remoteproc APIs, but shouldn't you be passing zynqmp_r5_rproc_mem_alloc to it instead of NULL?[Ben Levinsky] the difference is that for TCM we have to do make therelative address work for TCM, so the dma input to rproc_mem_entry_init is different in TCM case. The dma address is the address as seen by the RPU, is that right? So you are trying to set the dma address to something in the range 0 - 0x20000?[Ben Levinsky] yesquoted
quoted
quoted
quoted
+ rsc.name); + if (!mem) + return -ENOMEM; + + return mem; +} + +static int parse_mem_regions(struct rproc *rproc) +{ + int num_mems, i; + struct zynqmp_r5_pdata *pdata = rproc->priv; + struct device *dev = &pdata->dev; + struct device_node *np = dev->of_node; + struct rproc_mem_entry *mem; + + num_mems = of_count_phandle_with_args(np, "memory-region",quoted
quoted
quoted
NULL);quoted
+ if (num_mems <= 0) + return 0; + for (i = 0; i < num_mems; i++) { + struct device_node *node; + struct reserved_mem *rmem; + + node = of_parse_phandle(np, "memory-region", i);Check node != NULL ?[Ben Levinsky] will add this in v9quoted
quoted
+ rmem = of_reserved_mem_lookup(node); + if (!rmem) { + dev_err(dev, "unable to acquire memory-region\n");quoted
quoted
quoted
quoted
+ return -EINVAL; + } + + if (strstr(node->name, "vdev0buffer")) {vdev0buffer is not described in the device tree bindings, is that normal/expected?[Ben Levinsky] vdev0buffer is not required, as there might be simplefirmware loading with no IPC. Vdev0buffer only needed for IPC. OK, good. It should probably still be described in the device tree binding as optional property.quoted
quoted
quoted
+ /* Register DMA region */ + mem = rproc_mem_entry_init(dev, NULL, + (dma_addr_t)rmem-base,quoted
quoted
quoted
+ rmem->size, rmem-base,quoted
quoted
quoted
+ NULL, NULL, + "vdev0buffer"); + if (!mem) { + dev_err(dev, "unable to initializememory-quoted
quoted
quoted
region %s\n",quoted
+ node->name); + return -ENOMEM; + } + dev_dbg(dev, "parsed %s at %llx\r\n", mem-name,quoted
quoted
quoted
+ mem->dma); + } else if (strstr(node->name, "vdev0vring")) {Same herequoted
+ int vring_id; + char name[16]; + + /* + * can be 1 of multiple vring IDs per IPCchannelquoted
quoted
quoted
quoted
+ * e.g. 'vdev0vring0' and 'vdev0vring1' + */ + vring_id = node->name[14] - '0';Where does the "14" comes from? Are we sure it is not possible to haveaquoted
quoted
quoted
node->name smaller than 14 chars?[Ben Levinsky] Presently there are only 2 vrings used per Xilinx OpenAMPchannel to RPU. In Xilinx kernel we have hard-coded names as these are the only nodes that use it. For example RPU0vdev0vring0 and RPU1vdev0vring0. Hence we only check for vdev0vring and not a sscanf("%*s%i") to parse out the vring ID or other, cleaner solution. OK, but I think it is best if we use node->name[14] only if we explicitly check for a string of at least 14 characters. Using strstr, it could return the string "vdev0vring" which is less than 14 chars, leading to a bug.quoted
quoted
quoted
+ snprintf(name, sizeof(name), "vdev0vring%d",vring_id);quoted
+ /* Register vring */ + mem = rproc_mem_entry_init(dev, NULL, + (dma_addr_t)rmem-base,quoted
quoted
quoted
+ rmem->size, rmem-base,quoted
quoted
quoted
+zynqmp_r5_rproc_mem_alloc,quoted
+zynqmp_r5_rproc_mem_release,quoted
+ name); + dev_dbg(dev, "parsed %s at %llx\r\n", mem-name,quoted
quoted
quoted
+ mem->dma); + } else { + int idx; + + /* + * if TCM update address space for R5 and + * make xilinx platform mgmt call + */ + for (idx = 0; idx <ZYNQMP_R5_NUM_TCM_BANKS;quoted
quoted
quoted
idx++) {quoted
+ if (tcm_addr_to_pnode[idx][0] ==rmem-quoted
quoted
quoted
quoted
base) + break;There is something I don't quite understand. If the memory region tousequoted
quoted
quoted
is TCM, why would it be also described under reserved-memory? Reserved-memory is for normal memory being reserved, while TCM isnotquoted
quoted
quoted
normal memory. Am I missing something?[Ben Levinsky] I can change this in v9 as discussed. That is, have no TCMunder reserved mem. Instead have the banks as nodes in device tree with status="[enabled|disabled]" and if enabled, then try to add memories in parse_fw call.quoted
quoted
quoted
+ } + + if (idx != ZYNQMP_R5_NUM_TCM_BANKS) { + mem = handle_tcm_parsing(dev,rmem, node,quoted
quoted
quoted
idx);quoted
+ } else { + mem = rproc_mem_entry_init(dev,NULL,quoted
quoted
quoted
quoted
+ (dma_addr_t)rmem-base,quoted
quoted
quoted
+ rmem->size, rmem-base,quoted
quoted
quoted
+zynqmp_r5_rproc_mem_alloc,quoted
+zynqmp_r5_rproc_mem_release,quoted
+ node->name);This case looks identical to the vdev0vring above. Is the difference really just in the "name"? If so, can we merge the two cases into one? no, because the devm_ioremap_wc call has to be done before changingthe dma address to relative for TCM banks.quoted
quoted
[Ben Levinsky] ok will clean up to merge where able
quoted
quoted
quoted
quoted
+ } + + if (!mem) { + dev_err(dev, + "unable to init memory-region %s\n",quoted
quoted
quoted
quoted
+ node->name); + return -ENOMEM; + } + } + rproc_add_carveout(rproc, mem); + } + + return 0; +} + +static int zynqmp_r5_parse_fw(struct rproc *rproc, const structfirmwarequoted
quoted
*fw)quoted
+{ + int ret; + struct zynqmp_r5_pdata *pdata = rproc->priv; + struct device *dev = &pdata->dev; + + ret = parse_mem_regions(rproc); + if (ret) { + dev_err(dev, "parse_mem_regions failed %x\n", ret); + return ret; + } + + ret = rproc_elf_load_rsc_table(rproc, fw); + if (ret == -EINVAL) { + dev_info(dev, "no resource table found.\n"); + ret = 0;Why do we want to continue ignoring the error in this case?[Ben Levinsky] as there can be simple firmware loaded onto R5 that do nothave resource table. Resource table only needed for specific IPC case. OK, an in-code comment would be goodquoted
quoted
quoted
+ struct device *dev; + struct zynqmp_r5_mem *mem; + int ret; + struct property *prop; + const __be32 *cur; + u32 val; + int i; + + dev = &pdata->dev; + mem = devm_kzalloc(dev, sizeof(*mem), GFP_KERNEL); + if (!mem) + return -ENOMEM; + ret = of_address_to_resource(node, 0, &mem->res); + if (ret < 0) { + dev_err(dev, "failed to get resource of memory %s", + of_node_full_name(node)); + return -EINVAL; + } + + /* Get the power domain id */ + i = 0; + if (of_find_property(node, "pnode-id", NULL)) { + of_property_for_each_u32(node, "pnode-id", prop,cur, val)quoted
quoted
quoted
quoted
+ mem->pnode_id[i++] = val; + } + list_add_tail(&mem->node, &pdata->mems); + return 0; +} + +/** + * zynqmp_r5_release() - ZynqMP R5 device release function + * @dev: pointer to the device struct of ZynqMP R5 + * + * Function to release ZynqMP R5 device. + */ +static void zynqmp_r5_release(struct device *dev) +{ + struct zynqmp_r5_pdata *pdata; + struct rproc *rproc; + struct sk_buff *skb; + + pdata = dev_get_drvdata(dev); + rproc = pdata->rproc; + if (rproc) { + rproc_del(rproc); + rproc_free(rproc); + } + if (pdata->tx_chan) + mbox_free_channel(pdata->tx_chan); + if (pdata->rx_chan) + mbox_free_channel(pdata->rx_chan); + /* Discard all SKBs */ + while (!skb_queue_empty(&pdata->tx_mc_skbs)) { + skb = skb_dequeue(&pdata->tx_mc_skbs); + kfree_skb(skb); + } + + put_device(dev->parent); +} + +/** + * event_notified_idr_cb() - event notified idr callback + * @id: idr id + * @ptr: pointer to idr private data + * @data: data passed to idr_for_each callback + * + * Pass notification to remoteproc virtio + * + * Return: 0. having return is to satisfy the idr_for_each() function + * pointer input argument requirement. + **/ +static int event_notified_idr_cb(int id, void *ptr, void *data) +{ + struct rproc *rproc = data; + + (void)rproc_vq_interrupt(rproc, id); + return 0; +} + +/** + * handle_event_notified() - remoteproc notification work funciton + * @work: pointer to the work structure + * + * It checks each registered remoteproc notify IDs. + */ +static void handle_event_notified(struct work_struct *work) +{ + struct rproc *rproc; + struct zynqmp_r5_pdata *pdata; + + pdata = container_of(work, struct zynqmp_r5_pdata,mbox_work);quoted
quoted
quoted
quoted
+ + (void)mbox_send_message(pdata->rx_chan, NULL); + rproc = pdata->rproc; + /* + * We only use IPI for interrupt. The firmware side may or may + * not write the notifyid when it trigger IPI. + * And thus, we scan through all the registered notifyids. + */ + idr_for_each(&rproc->notifyids, event_notified_idr_cb, rproc);This looks expensive. Should we at least check whether the notifyid was written as first thing before doing the scan?[Ben Levinsky] this will be at most 2 vrings presently per firmware-loadandquoted
only done when the firmware is loaded so the latency so should not impact performace or user OKquoted
quoted
quoted
+ /* Get R5 power domain node */ + ret = of_property_read_u32(node, "pnode-id", &pdata-pnode_id);quoted
quoted
quoted
+ if (ret) { + dev_err(dev, "failed to get power node id.\n"); + goto error; + } + + /* TODO Check if R5 is running */ + + /* Set up R5 if not already setup */ + ret = pdata->is_r5_mode_set ? 0 : r5_set_mode(pdata); + if (ret) { + dev_err(dev, "failed to set R5 operation mode.\n"); + return ret; + }is_r5_mode_set is set by r5_set_mode(), which is only called here. So it looks like this check is important in cases where zynqmp_r5_probe() is called twice for the same R5 node. But I don't think that is supposed to happen?[Ben Levinsky] this is needed as there are cases where user can repeatedlyload different firmware so the check is needed in cases like this where rpu is already configured. It is also possible that a user might repeatedly load/unload the module OK
_______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel