Thread (56 messages) 56 messages, 8 authors, 2022-08-03

Re: [Patch v4 03/12] net: mana: Handle vport sharing between devices

From: Jason Gunthorpe <jgg@ziepe.ca>
Date: 2022-07-29 19:12:47
Also in: linux-hyperv, linux-rdma, lkml

On Fri, Jul 29, 2022 at 06:44:22PM +0000, Long Li wrote:
quoted
Subject: Re: [Patch v4 03/12] net: mana: Handle vport sharing between devices

On Thu, Jul 21, 2022 at 05:58:39PM +0000, Long Li wrote:
quoted
quoted
quoted
"vport" is a hardware resource that can either be used by an
Ethernet device, or an RDMA device. But it can't be used by both
at the same time. The "vport" is associated with a protection
domain and doorbell, it's programmed in the hardware. Outgoing
traffic is enforced on this vport based on how it is programmed.
Sure, but how is the users problem to "get this configured right"
and what exactly is the user supposed to do?

I would expect the allocation of HW resources to be completely
transparent to the user. Why is it not?
In the hardware, RDMA RAW_QP shares the same hardware resource (in
this case, the vPort in hardware table) with the ethernet NIC. When an
RDMA user creates a RAW_QP, we can't just shut down the ethernet. The
user is required to make sure the ethernet is not in used when he
creates this QP type.
You haven't answered my question - how is the user supposed to achieve this?
The user needs to configure the network interface so the kernel will not use it when the user creates a RAW QP on this port.

This can be done via system configuration to not bring this
interface online on system boot, or equivalently doing "ifconfig xxx
down" to make the interface down when creating a RAW QP on this
port.
That sounds horrible, why allow the user to even bind two drivers if
the two drivers can't be used together?
quoted
And now I also want to know why the ethernet device and rdma device can even
be loaded together if they cannot share the physical port?
Exclusivity is not a sharing model that any driver today implements.
This physical port limitation only applies to the RAW QP. For RC QP,
the hardware doesn't have this limitation. The user can create RC
QPs on a physical port up to the hardware limits independent of the
Ethernet usage on the same port.
.. and it is because you support sharing models in other cases :\
Scenario 1: The Ethernet loses TCP connection.
1. User A runs a program listing on a TCP port, accepts an incoming
TCP connection and is communicating with the remote peer over this
TCP connection.
2. User B creates an RDMA RAW_QP on the same port on the device.
3. As soon as the RAW_QP is created, the program in 1 can't
send/receive data over this TCP connection. After some period of
inactivity, the TCP connection terminates.
It is a little more complicated than that, but yes, that could
possibly happen if the userspace captures the right traffic.
Please note that this may also pose a security risk. User B with
RAW_QP can potentially hijack this TCP connection from the kernel by
framing the correct Ethernet packets and send over this QP to trick
the remote peer, making it believe it's User A.
Any root user can do this with the netstack using eg tcpdump, bpf,
XDP, raw sockets, etc. This is why the capability is guarded by
CAP_NET_RAW. It is nothing unusual.
Scenario 2: The Ethernet port state changes after RDMA RAW_QP is used on the port.
1. User uses "ifconfig ethx down" on the NIC, intending to make it offline
2. User creates a RDMA RAW_QP on the same port on the device.
3. User destroys this RAW_QP.
4. The ethx device in 1 reports carrier state in step 2, in many
Linux distributions this makes it online without user
interaction. "ifconfig ethx" shows its state changes to "up".
This I'm not familiar with, it actually sounds like a bug that the
RAW_QP's interfere with the netdev carrier state.
the Mellanox NICs implement the RAW_QP. IMHO, it's better to have
the user explicitly decide whether to use Ethernet or RDMA RAW_QP on
a specific port.
It should all be carefully documented someplace.

Jason
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help