Thread (20 messages) 20 messages, 7 authors, 2025-03-04

Re: [PATCH net] net/smc: use the correct ndev to find pnetid by pnetid table

From: Halil Pasic <pasic@linux.ibm.com>
Date: 2025-02-10 13:53:33
Also in: linux-rdma, linux-s390, lkml

On Fri, 10 Jan 2025 13:43:44 +0800
Guangguan Wang [off-list ref] wrote:
We want to use SMC in container on cloud environment, and encounter problem
when using smc_pnet with commit 890a2cb4a966. In container, there have choices
of different container network, such as directly using host network, virtual
network IPVLAN, veth, etc. Different choices of container network have different
netdev hierarchy. Examples of netdev hierarchy show below. (eth0 and eth1 in host
below is the netdev directly related to the physical device).
 _______________________________      ________________________________   
|   _________________           |     |   _________________           |  
|  |POD              |          |     |  |POD  __________  |          |  
|  |                 |          |     |  |    |upper_ndev| |          |  
|  | eth0_________   |          |     |  |eth0|__________| |          |  
|  |____|         |__|          |     |  |_______|_________|          |  
|       |         |             |     |          |lower netdev        |  
|       |         |             |     |        __|______              |  
|   eth1|base_ndev| eth0_______ |     |   eth1|         | eth0_______ |  
|       |         |    | RDMA  ||     |       |base_ndev|    | RDMA  ||  
| host  |_________|    |_______||     | host  |_________|    |_______||  
———————————————————————————————-      ———————————————————————————————-    
 netdev hierarchy if directly          netdev hierarchy if using IPVLAN    
   using host network
 _______________________________
|   _____________________       |
|  |POD        _________ |      |
|  |          |base_ndev||      |
|  |eth0(veth)|_________||      |
|  |____________|________|      |
|               |pairs          |
|        _______|_              |
|       |         | eth0_______ |
|   veth|base_ndev|    | RDMA  ||
|       |_________|    |_______||
|        _________              |
|   eth1|base_ndev|             |
| host  |_________|             |
 ———————————————————————————————
  netdev hierarchy if using veth

Due to some reasons, the eth1 in host is not RDMA attached netdevice, pnetid
is needed to map the eth1(in host) with RDMA device so that POD can do SMC-R.
Because the eth1(in host) is managed by CNI plugin(such as Terway, network
management plugin in container environment), and in cloud environment the
eth(in host) can dynamically be inserted by CNI when POD create and dynamically
be removed by CNI when POD destroy and no POD related to the eth(in host)
anymore.
I'm pretty clueless when it comes to the details of CNI but I think
I'm barely able to follow. Nevertheless if you have the feeling that
my extrapolations are wrong, please do point it out.
It is hard for us to config the pnetid to the eth1(in host). So we
config the pnetid to the netdevice which can be seen in POD.
Hm, this sounds like you could set PNETID on eth1 (in host) for each of
the cases and everything would be cool (and would work), but because CNI
and the environment do not support it, or supports it in a very
inconvenient way, you are looking for a workaround where PNETID is set
in the POD. Is that right? Or did I get something wrong?
When do SMC-R, both
the container directly using host network and the container using veth network
can successfully match the RDMA device, because the configured pnetid netdev is a
base_ndev. But the container using IPVLAN can not successfully match the RDMA
device and 0x03030000 fallback happens, because the configured pnetid netdev is
not a base_ndev. Additionally, if config pnetid to the eth1(in host) also can not
work for matching RDMA device when using veth network and doing SMC-R in POD.
That I guess answers my question from the first paragraph. Setting
PNETID on eth1 (host) would not be sufficient for veth. Right?

Another silly question: is making the PNETID basically a part of the Pod
definition shifting PNETID from the realm of infrastructure (i.e.
configured by the cloud provider) to the ream of an application (i.e.
configured by the tenant)?

AFAIU veth (host) is bridged (or similar) to eth1 (host) and that is in
the host, and this is where we make sure that the requirements for SMC-R
are satisfied.

But veth (host) could be attached to eth3 which is on a network not
reachable via eth0 (host) or eth1 (host). In that case the pod could
still set PNETID on veth (POD). Or?
My patch can resolve the problem we encountered and also can unify the pnetid setup
of different network choices list above, assuming the pnetid is not limited to
config to the base_ndev directly related to the physical device(indeed, the current
implementation has not limited it yet).
I see some problems here, but I'm afraid we see different problems. For
me not being able to set eth0 (veth/POD)'s PNEDID from the host is a
problem. Please notice that with the current implementation users can
only control the PNETID if infrastructure does not do so in the first
place.


Can you please help me reason about this? I'm unfortunately lacking
Kubernetes skills here, and it is difficult for me to think along.

Regards,
Halil
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help