Thread (4 messages) 4 messages, 3 authors, 2021-05-20

RE: [PATCHv6 1/1] nvme-tcp: Add option to set the physical interface to be used when connecting over TCP sockets.

From: Belanger, Martin <hidden>
Date: 2021-05-20 18:49:51

On 5/17/21 11:16 AM, Martin Belanger wrote:
quoted
From: Martin Belanger <redacted>

Addressed Sagi's review from PATCHv5.
This commentary belongs after the '---' separator.
quoted
In our application, we need a way to force TCP connections to go out a
specific IP interface instead of letting Linux select the interface
based on the routing tables. This patch adds the option 'host-iface'
to allow specifying the interface to use. Note that corresponding
changes to the nvme-cli utility will follow.

When the option host-iface is specified, the driver uses the specified
interface to set the option SO_BINDTODEVICE on the TCP socket before
connecting.

This new option is needed in addtion to the existing host-traddr for
the following reasons:

Specifying an IP interface by its associated IP address is less
intuitive than specifying the actual interface name and, in some
cases, simply doesn't work. That's because the association between
interfaces and IP addresses is not predictable. IP addresses can be
changed or can change by themselves over time (e.g. DHCP). Interface
names are predictable [1] and will persist over time. Consider the
following configuration.

1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state ...
     link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
     inet 100.0.0.100/24 scope global lo
        valid_lft forever preferred_lft forever
2: enp0s3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc ...
     link/ether 08:00:27:21:65:ec brd ff:ff:ff:ff:ff:ff
     inet 100.0.0.100/24 scope global enp0s3
        valid_lft forever preferred_lft forever
3: enp0s8: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc ...
     link/ether 08:00:27:4f:95:5c brd ff:ff:ff:ff:ff:ff
     inet 100.0.0.100/24 scope global enp0s8
        valid_lft forever preferred_lft forever

The above is a VM that I configured with the same IP address
(100.0.0.100) on all interfaces. Doing a reverse lookup to identify
the unique interface associated with 100.0.0.100 does not work here.
And this is why the option host_iface is required. I understand that
the above config does not represent a standard host system, but I'm
using this to prove a point: "We can never know how users will
configure their systems". By te way, The above configuration is
perfectly fine by Linux.

The current TCP implementation for host_traddr performs a
bind()-before-connect(). This is a common construct to set the source
IP address on a TCP socket before connecting. This has no effect on
how Linux selects the interface for the connection. That's because
Linux uses the Weak End System model as described in RFC1122 [2]. On
the other hand, setting the Source IP Address has benefits and should
be supported by linux-nvme. In fact, setting the Source IP Address is
a mandatory FedGov requirement (e.g. connection to a RADIUS/TACACS+
server).
quoted
Consider the following configuration.

$ ip addr list dev enp0s8
3: enp0s8: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc ...
     link/ether 08:00:27:4f:95:5c brd ff:ff:ff:ff:ff:ff
     inet 192.168.56.101/24 brd 192.168.56.255 scope global enp0s8
        valid_lft 426sec preferred_lft 426sec
     inet 192.168.56.102/24 scope global secondary enp0s8
        valid_lft forever preferred_lft forever
     inet 192.168.56.103/24 scope global secondary enp0s8
        valid_lft forever preferred_lft forever
     inet 192.168.56.104/24 scope global secondary enp0s8
        valid_lft forever preferred_lft forever

Here we can see that several addresses are associated with interface
enp0s8. By default, Linux always selects the default IP address,
192.168.56.101, as the source address when connecting over interface
enp0s8. Some users, however, want the ability to specify a different
source address (e.g., 192.168.56.102, 192.168.56.103, ...). The option
host_traddr can be used as-is to perform this function.

In conclusion, I believe that we need 2 options for TCP connections.
One that can be used to specify an interface (host-iface). And one
that can be used to set the source address (host-traddr). Users should
be allowed to use one or the other, or both, or none. Of course, the
documentation for host_traddr will need some clarification. It should
state that when used for TCP connection, this option only sets the
source address. And the documentation for host_iface should say that
this option is only available for TCP connections.

References:
[1]
https://urldefense.com/v3/__https://www.freedesktop.org/wiki/Software/
systemd/*5C__;JQ!!LpKI!3qE5jJQA-REQkOr1c042U-
ghm28oHvTE48YZkHM5ugob8Sm
quoted
IPPIHxwEm7iwkC9kZyA$ [freedesktop[.]org]
PredictableNetworkInterfaceNames/ [2]
https://urldefense.com/v3/__https://tools.ietf.org/html/rfc1122__;!!Lp
KI!3qE5jJQA-REQkOr1c042U-
ghm28oHvTE48YZkHM5ugob8SmIPPIHxwEm7ixiy1Q97A$
quoted
[tools[.]ietf[.]org]

Tested both IPv4 and IPv6 connections.
Also this.

Can you send the nvme-cli bits as well?
Hi Sagi,

Just checking if there anything else I can do to help with this patch?

The corresponding nvme-cli changes can be inspected in Github at the following link.
https://github.com/martin-belanger/nvme-cli/commit/628aca9d66ddffaa78bdaa46668ecdc3d000a017

Note that I will only submit the nvme-cli changes after this nvme-tcp patch has been approved.

Thanks,
Martin

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help