Thread (100 messages) 100 messages, 3 authors, 2018-07-12

Re: [PATCH v2 6/7] net/mlx5: probe all port representors

From: Xueming(Steven) Li <hidden>
Date: 2018-06-27 17:30:37

-----Original Message-----
From: Adrien Mazarguil <redacted>
Sent: Wednesday, June 27, 2018 9:32 PM
To: Shahaf Shuler <redacted>
Cc: Xueming(Steven) Li <redacted>; dev@dpdk.org
Subject: Re: [dpdk-dev] [PATCH v2 6/7] net/mlx5: probe all port representors

On Sun, Jun 17, 2018 at 10:15:07AM +0000, Shahaf Shuler wrote:
quoted
Hi Adrien,

Saturday, June 16, 2018 11:58 AM, Xueming(Steven) Li:
quoted
Subject: RE: [dpdk-dev] [PATCH v2 6/7] net/mlx5: probe all port
representors
quoted
-----Original Message-----
From: dev <redacted> On Behalf Of Adrien Mazarguil
Sent: Thursday, June 14, 2018 4:35 PM
To: Shahaf Shuler <redacted>
Cc: dev@dpdk.org
Subject: [dpdk-dev] [PATCH v2 6/7] net/mlx5: probe all port
representors

Probe existing port representors in addition to their master
device and
associate them automatically.
quoted
To avoid name collision between Ethernet devices, their names use
the same convention as ixgbe and i40e PMDs, that is, instead of
only a PCI
address in DBDF notation:
quoted
- "net_{DBDF}_0" for master/switch devices.
This is breaking compatibility for application using the device names in order to attach them to the
application (e.g. OVS-DPDK).
quoted
Before this patch the naming scheme for non-representor port is "{DBDF}".

Can we preserve the compatibility and add appropriate suffix for the representor case?
There's one issue if representors are hot-plugged. The name of the master device, which happens to be
that of the switch domain, cannot be updated. The form "net_{DBDF}_0" seems expected for PMDs that
support representors (see ixgbe and i40e).

Now since representor hot-plugging is not supported yet, I guess we could postpone this problem by
keeping the old format in the meantime, however ideally, these applications should not rely on it. The
only safe assumption they can make is the uniqueness of any given name among ethdevs.

PCI bus addresses, if needed, should be retrieved by looking at the underlying bus object.

By the way, while thinking again about a past comment from Xueming [1], maybe it's finally time to
remove support for multiple Verbs ports on mlx5 after all. This should drop another unnecessary loop
and the need for the unused "port %u" suffix at all while naming the device.

So how about the following plan for v3:

- Adding a patch that drops support for multiple Verbs ports (note for
  Xueming, yes I changed my mind *again* :)

- If you really think this will break OVS (please confirm), then when no
  "representor" parameter is provided (regardless of the presence of any
  representors), name format will use the usual "{DBDF}" notation as you
  suggested.

- Otherwise as soon as a "representor" is found on the command line, the new
  format will be used, again regardless of the presence of any representors.
The port creation sequence of upcoming hot plug looks like this:
0000:81:00.1
0000:81:00.1,representor=0
0000:81:00.1,representor=1

So the PF attaching comes always w/o "representor" parameter.
- In both cases, representors if any, will be named according to the format
  specified in this patch.

[1]
https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmails.dpdk.org%2Farchives%2Fdev%2F2
018-
June%2F104015.html&data=02%7C01%7Cxuemingl%40mellanox.com%7Cad9a1b32e5e241e375d208d5dc32778b%7Ca652971
c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636657031666639542&sdata=w4oNeWXwKXS0%2BNSZsYQaneW%2BkFxvWIHZFHLoM
fLOxkg%3D&reserved=0

<snip>
quoted
quoted
quoted
 	for (i = 0; i < attr.orig_attr.phys_port_cnt; ++i) {
-		eth_list[i] = mlx5_dev_spawn_one(dpdk_dev, ibv_dev, vf,
-						 &attr, i + 1);
-		if (eth_list[i])
-			continue;
-		/* Save rte_errno and roll back in case of failure. */
-		ret = rte_errno;
-		while (i--) {
-			mlx5_dev_close(eth_list[i]);
-			if (rte_eal_process_type() == RTE_PROC_PRIMARY)
-				rte_free(eth_list[i]->data->dev_private);
-			claim_zero(rte_eth_dev_release_port(eth_list[i]));
-		}
-		free(eth_list);
-		rte_errno = ret;
-		return NULL;
+		eth_list[n] = mlx5_dev_spawn_one(dpdk_dev, ibv_dev[j],
vf,
quoted
+						 &attr, i + 1,
+						 j ? eth_list[0] : NULL,
+						 j - 1);
The representor id is according to the sort made by qsort (based on device names).
A better way may be to set it according to the sysfs information, like you do in the mlx5_get_ifname
function.
quoted
What do you think?
I agree that the current approach sucks, hence the big fat warnings I left around (see discussion with
Xueming [2]). Problem is that the needed information is not yet known at this stage; there is no
private structure to rely on to use mlx5_get_ifname() directly.

I'd also rather see these assumptions go in any case. I'll attempt to improve things for v3 in
preparation of allowing representors to be probed on their own anytime, possibly even before the
master device.

[2]
https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmails.dpdk.org%2Farchives%2Fdev%2F2
018-
June%2F104059.html&data=02%7C01%7Cxuemingl%40mellanox.com%7Cad9a1b32e5e241e375d208d5dc32778b%7Ca652971
c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636657031666639542&sdata=5eOWb69duEB%2BkIW1ZGkv%2FLxkZfwErQOd%2FV7
nDpN2jOg%3D&reserved=0

<snip>
quoted
quoted
quoted
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index
997b04a33..0fe467140 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -161,6 +161,10 @@ struct priv {
 	uint16_t mtu; /* Configured MTU. */
 	uint8_t port; /* Physical port number. */
 	unsigned int isolated:1; /* Whether isolated mode is enabled. */
+	unsigned int representor:1; /* Device is a port representor. */
Why we need above flag? Why can't we use RTE_ETH_DEV_REPRESENTOR from eth_dev->data->dev_flags.
Problem is that this flag can only be set once the ethdev is fully instantiated and can't be relied on
internally where needed (e.g. during clean up in error handling code). It's reported to applications
but not used internally.

As a device property, it's actually pretty similar to the VF bit or offloaded capabilities where
checking exposed information would be needlessly complex.

Now maybe it could be part of struct mlx5_dev_config as well. I initially assumed this object was only
for user-provided parameters but looks like it's not the case. I intend to move it there for v3.

--
Adrien Mazarguil
6WIND
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help