Thread (100 messages) 100 messages, 3 authors, 2018-07-12

Re: [PATCH v4 07/10] net/mlx5: probe all port representors

From: Shahaf Shuler <hidden>
Date: 2018-07-10 11:17:37

Tuesday, July 10, 2018 1:59 PM, Adrien Mazarguil:
Subject: Re: [PATCH v4 07/10] net/mlx5: probe all port representors

On Tue, Jul 10, 2018 at 10:13:25AM +0000, Shahaf Shuler wrote:
quoted
Tuesday, July 10, 2018 12:37 PM, Adrien Mazarguil:
quoted
Subject: Re: [PATCH v4 07/10] net/mlx5: probe all port representors

On Mon, Jul 09, 2018 at 11:57:29AM +0000, Shahaf Shuler wrote:
quoted
Hi Adrien,


Thursday, July 5, 2018 11:46 AM, Adrien Mazarguil:
quoted
Subject: [PATCH v4 07/10] net/mlx5: probe all port representors

Probe existing port representors in addition to their master
device and associate them automatically.

To avoid collision between Ethernet devices, they are named as
follows:
quoted
quoted
quoted
quoted
- "{DBDF}" for master/switch devices.
- "{DBDF}_representor_{rep}" with "rep" starting from 0 for port
  representors.

(Patch based on prior work from Yuanhan Liu)

Signed-off-by: Adrien Mazarguil <redacted>
Signed-off-by: Nelio Laranjeiro <redacted>
Reviewed-by: Xueming Li <redacted>
Cc: Xueming Li <redacted>
Cc: Shahaf Shuler <redacted>
--
v4 changes:

- Fixed domain ID release once the last port using it is closed. Closed
  devices are not necessarily detached, their presence is not a good
  indicator. Code was modified to check if they still use their domain
IDs
quoted
quoted
quoted
quoted
  before deciding to release it.
<snip>
quoted
quoted
@@ -883,6 +915,41 @@ mlx5_dev_spawn(struct rte_device
*dpdk_dev,
quoted
quoted
quoted
quoted
 	priv->nl_socket_rdma = mlx5_nl_init(0, NETLINK_RDMA);
 	priv->nl_socket_route =	mlx5_nl_init(RTMGRP_LINK,
NETLINK_ROUTE);
 	priv->nl_sn = 0;
+	priv->representor = !!switch_info->representor;
+	priv->domain_id =
RTE_ETH_DEV_SWITCH_DOMAIN_ID_INVALID;
quoted
quoted
quoted
quoted
+	priv->representor_id =
+		switch_info->representor ? switch_info->port_name
: -1;
quoted
quoted
quoted
quoted
+	/*
+	 * Look for sibling devices in order to reuse their switch
domain
quoted
quoted
quoted
quoted
+	 * if any, otherwise allocate one.
+	 */
+	i = mlx5_dev_to_port_id(dpdk_dev, NULL, 0);
+	if (i > 0) {
+		uint16_t port_id[i];
+
+		i = RTE_MIN(mlx5_dev_to_port_id(dpdk_dev,
port_id, i), i);
quoted
quoted
quoted
quoted
+		while (i--) {
+			const struct priv *opriv =
+				rte_eth_devices[port_id[i]].data-
quoted
dev_private;
+
+			if (!opriv ||
+			    opriv->domain_id ==
+
RTE_ETH_DEV_SWITCH_DOMAIN_ID_INVALID)
quoted
quoted
quoted
quoted
+				continue;
+			priv->domain_id = opriv->domain_id;
It looks like for the second port it will use the domain_id of the
first port. Is
that what you intent?

Yes, it's on purpose. Master and representors of a given device must
share the same domain ID to let applications know they can create
flow rules to forward traffic between them all.
But this is not the case in Mellanox devices. On Mellanox devices each PF
along w/ its representors has a separate eswitch, and traffic cannot be
routed between the switches using flow rules.
quoted
For example if we have PF0 along w/ its representor REP0_0 and PF1 along
w/ its representor REP1_0 . PF0 and REP0_0 will belong to switch X and PF1
and REP1_0 will belong to switch domain Y. it is also being reflected on the
phys_switch_id.
quoted
We should have switch domain per PF.
Looks like I didn't understand your previous comment. I confirm there is no
such issue, one domain ID is allocated per PF/representors group, which are
identified by a common PCI bus address. It's fine because on mlx5, each
physical port exposes its own address, I assumed there was no need to
additionally compare phys_switch_id. Can this happen?
OK great. It is OK, the PF has only a single switch domain on which all its representors are connected. 
quoted
quoted
quoted
Note - I couldn't test it due to compilation errors:
/.autodirect/swgwork/shahafs/workspace/dpdk.org/drivers/net/mlx5/mlx
quoted
quoted
5
_nl.c: In function 'mlx5_nl_switch_info_cb':
quoted
/.autodirect/swgwork/shahafs/workspace/dpdk.org/drivers/net/mlx5/mlx
quoted
quoted
5
_
quoted
nl.c:843:8: error: 'IFLA_PHYS_PORT_NAME' undecl ared (first use in
this
function)
quoted
   case IFLA_PHYS_PORT_NAME:
        ^
/.autodirect/swgwork/shahafs/workspace/dpdk.org/drivers/net/mlx5/mlx
quoted
quoted
5
_
quoted
nl.c:843:8: note: each undeclared identifier is  reported only
once for each function it appears in
/.autodirect/swgwork/shahafs/workspace/dpdk.org/drivers/net/mlx5/mlx
quoted
quoted
5
_
quoted
nl.c:851:8: error: 'IFLA_PHYS_SWITCH_ID' undecl ared (first use in
this
function)
quoted
   case IFLA_PHYS_SWITCH_ID:
        ^

My system info:
NAME="Red Hat Enterprise Linux Server"
VERSION="7.3 (Maipo)"
ID="rhel"
ID_LIKE="fedora"
VERSION_ID="7.3"
PRETTY_NAME="Red Hat Enterprise Linux Server 7.3 (Maipo)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:redhat:enterprise_linux:7.3:GA:server"
HOME_URL="https://emea01.safelinks.protection.outlook.com/?url=https
quoted
quoted
%
3A%2F%2Fwww.redhat.com%2F&amp;data=02%7C01%7Cshahafs%40mellan
quoted
quoted
ox.com%7C661e7b51087b460817c008d5e648bf1e%7Ca652971c7d2e4d9ba6a4
quoted
quoted
d149256f461b%7C0%7C0%7C636668122474445351&amp;sdata=Lg8arhiYLvH5L
quoted
quoted
2hef8DVhS8A3fVJ%2B5IZkLIHmqCd%2FmY%3D&amp;reserved=0"
quoted
BUG_REPORT_URL="https://emea01.safelinks.protection.outlook.com/?url
quoted
quoted
=
https%3A%2F%2Fbugzilla.redhat.com%2F&amp;data=02%7C01%7Cshahafs%
quoted
quoted
40mellanox.com%7C661e7b51087b460817c008d5e648bf1e%7Ca652971c7d2e
quoted
quoted
4d9ba6a4d149256f461b%7C0%7C0%7C636668122474445351&amp;sdata=3Do
quoted
quoted
RKjxovM8tOgKLssC1mq2wwfhjpVUZSExXV4ywBEQ%3D&amp;reserved=0"
quoted
quoted
quoted
REDHAT_BUGZILLA_PRODUCT="Red Hat Enterprise Linux 7"
REDHAT_BUGZILLA_PRODUCT_VERSION=7.3
REDHAT_SUPPORT_PRODUCT="Red Hat Enterprise Linux"
REDHAT_SUPPORT_PRODUCT_VERSION="7.3"
Red Hat Enterprise Linux Server release 7.3 (Maipo) Red Hat
Enterprise Linux Server release 7.3 (Maipo)
OK, I'll redefine in v5 in case they are missing on the host system.

<snip>
quoted
quoted
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index
704046270..cc01310e0 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -159,6 +159,7 @@ struct priv {
 	struct ibv_context *ctx; /* Verbs context. */
 	struct ibv_device_attr_ex device_attr; /* Device properties.
*/
quoted
quoted
quoted
quoted
 	struct ibv_pd *pd; /* Protection Domain. */
+	char ibdev_name[IBV_SYSFS_NAME_MAX]; /* IB device
name. */
quoted
quoted
quoted

Why we need a dedicated entry for the ibdev_name? it is already
part of
priv->ctx->device->name.

Heh, same reason as the next line below, don't forget those damn
secondaries which can't dereference local pointers from the primary
process
:)
Right 😊.
quoted
quoted
quoted
 	char ibdev_path[IBV_SYSFS_PATH_MAX]; /* IB device path
for
quoted
quoted
quoted
quoted
secondary */
<snip>
quoted
quoted
struct rte_eth_dev_info *info)
 	info->speed_capa = priv->link_speed_capa;
 	info->flow_type_rss_offloads = ~MLX5_RSS_HF_MASK;
 	mlx5_set_default_params(dev, info);
+	info->switch_info.name = dev->data->name;
+	info->switch_info.domain_id = priv->domain_id;
+	info->switch_info.port_id = priv->representor_id;
+	if (priv->representor) {
+		unsigned int i = mlx5_dev_to_port_id(dev->device,
NULL, 0);
quoted
quoted
quoted
quoted
+		uint16_t port_id[i];
+
+		i = RTE_MIN(mlx5_dev_to_port_id(dev->device,
port_id, i),
quoted
quoted
quoted
quoted
i);
+		while (i--) {
+			struct priv *opriv =
+				rte_eth_devices[port_id[i]].data-
quoted
dev_private;
+
+			if (!opriv ||
+			    opriv->representor ||
+			    opriv->domain_id != priv->domain_id)
+				continue;
+			/*
+			 * Override switch name with that of the
master
quoted
quoted
quoted
quoted
+			 * device.
+			 */
+			info->switch_info.name = opriv->dev_data-
name;
quoted
quoted
quoted
+			break;
According to this logic it means once the master device is closed,
all the
representors are no longer belong to the same switch (switch name of
each is different) which is not correct.

They still share the same domain ID, which is what actually matters.
The switch name is only provided to let applications identify the
master
(control) device in case it's needed.
quoted
According to your notes it is possible to close master w/o closing
the
representor.

This allows devices to be probed in any order on a needed basis, not
all at once. It's done on purpose to pave the way for hotplug support.
quoted
Why not just storing the master switch name when probing the
representor and to use it as is on the dev_info?

The switch name *must* be that of the master device. If the master
is not probed, there can't be a switch name. However there's no real
provision for this in the API, so I chose the most acceptable unique
name, which is the name of the local device. Would you prefer an empty
name instead?
quoted
The current approach is OK.
I was just suggesting to skip the loop iteration by saving the switch name on
the private structure.

This is unsafe, if the master device is never probed or somehow replaced by
a different device with no relationship, this information could be wrong.

Keep in mind these ethdev names are just identifiers. The only requirement
is that they must be unique, however anything can be written in there. If
some name is not taken, another device can use it.
quoted
quoted
Thing is, on mlx5 flow rules can be created directly between
representors without involving the master device. An empty switch
name may be misleading in this respect.

What do you suggest?
--
Adrien Mazarguil
6WIND
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help