Re: [PATCH] ethdev: fix wrong memset
From: Ananyev, Konstantin <hidden>
Date: 2017-01-23 13:06:11
-----Original Message----- From: Yuanhan Liu [mailto:yuanhan.liu@linux.intel.com] Sent: Monday, January 23, 2017 12:53 PM To: Ananyev, Konstantin <redacted> Cc: Yigit, Ferruh <redacted>; dev@dpdk.org; Thomas Monjalon <redacted>; Horton, Remy [off-list ref] Subject: Re: [dpdk-dev] [PATCH] ethdev: fix wrong memset On Mon, Jan 23, 2017 at 12:44:11PM +0000, Ananyev, Konstantin wrote:quoted
quoted
On Mon, Jan 23, 2017 at 07:40:50PM +0800, Yuanhan Liu wrote:quoted
On Mon, Jan 23, 2017 at 11:32:23AM +0000, Ferruh Yigit wrote:quoted
On 1/23/2017 11:24 AM, Yuanhan Liu wrote:quoted
On Mon, Jan 23, 2017 at 11:05:25AM +0000, Ferruh Yigit wrote:quoted
quoted
quoted
quoted
quoted
quoted
quoted
lib/librte_ether/rte_ethdev.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c index 4790faf..61f44e2 100644 --- a/lib/librte_ether/rte_ethdev.c +++ b/lib/librte_ether/rte_ethdev.c@@ -225,7 +225,7 @@ struct rte_eth_dev * return NULL; } - memset(&rte_eth_devices[port_id], 0, sizeof(*eth_dev->data)); + memset(&rte_eth_dev_data[port_id], 0, sizeof(struct rte_eth_dev_data));Not directly related to the this issue, but, after fix, this may have issues with secondary process. There were patches sent to fix this.I mean this one: http://dpdk.org/ml/archives/dev/2017-January/054422.htmld948f596fee2 ("ethdev: fix port data mismatched in multiple process model") should have fixed it.Think about case, where secondary process uses a virtual PMD, which does a rte_eth_dev_allocate() call, shouldn't this corrupt primary process device data?Yes, it may. However, I doubt that's the typical usage.But this is a use case, and broken now,I thought it was broken since the beginning?No, memset(&rte_eth_dev_data[port_id], ...) breaks it.Oh, you were talking about that particular case Remy's patch meant to fix.quoted
quoted
quoted
and fix is known.And there is already a fix?http://dpdk.org/ml/archives/dev/2017-January/054422.htmlYes, it should fix that issue.Well, few more thoughts: it may fix the crash issue Remy saw, but it looks like more a workaround to me. Basically, if primary and secondary shares a same port id, they should point to same device. Otherwise, primary process may use eth_dev->data for a device A, while the secondary process may use it for another device, as you said, it could be a vdev. In such case, there is no way we could continue safely. That said, the given patch avoids the total reset of eth_dev->data, while it continues reset the eth_dev->data->name, which is wrong. So it's not a proper fix. Again, I think it's more about the usage. If primary starts with a nic device A, while the secondary starts with a nic device B, there is no way they could work well (unless they use different port id).Why not? I think this is possible.Yes, it's possible: find another port id if that one is already taken by primary process (or even by secondary process: think that primary process might attatch a port later).quoted
They just need to be initialized properly, so each rte_eth_devices[port_id]->data, etc. point to the right place.My understanding is, as far as they use different port_id, it might be fine. Just not sure it's enough or not.
As I understand, the main problem is that rte_eth_devices[] is local, while rte_eth_dev_data points to the shared memory array. And rte_eth_dev_allocate() assumes that if rte_eth_devices[x] is free, then rte_eth_dev_data[port_id] is also free. Which is wrong in case when primary/secondary processes have different devices attached. Another problem is that inside rte_ethdev.c we manipulate rte_eth_dev_data[] contents without grabbing any lock. I think it was an attempt to fix that issue in 16.07 timeframe or so, but I don't remember what happened with that patch. Konstantin