Thread (3 messages) 3 messages, 3 authors, 2021-11-18

Re: Sysfs paths to NVME devices

From: Martin Wilck <hidden>
Date: 2021-11-18 16:31:48

On Thu, 2021-11-18 at 07:30 -0800, Keith Busch wrote:
On Thu, Nov 18, 2021 at 03:28:26PM +0100, Jean Delvare wrote:
quoted
Hi all,

I have a few questions related to the sysfs paths to NVME devices. We
have a dracut module using udevadm on block devices (under
/sys/dev/block) to figure out which drivers should be included in the
initrd, and noticed that it does not always work for NVME devices.
Upon
investigation, it was discovered that the link in /sys/dev/block
leads
to either a physical NVME device (e.g.
/sys/devices/pci0000:00/0000:00:06.0/nvme/nvme0/nvme0n1) or a virtual
NVME device (e.g. /sys/devices/virtual/nvme-subsystem/nvme-
subsys0/nvme0n1). The latter case is problematic because virtual NVME
devices do not have a driver attached to them.

First of all I would like to understand what is the deciding factor
inside the kernel to go for virtual devices or physical devices. At
first I thought it was related to CONFIG_NVME_MULTIPATH, but in fact
all our systems have that option enabled, still some have virtual
devices and others have physical devices.
If you're using nvme native multipathing, and your namespace reports
that it is multipath capable (ID_NS.NMIC), then the driver will set up
the virtual device for the visible block device.

If your namespace isn't multipath capable, you will only get the
physical device. That's just for pci, though; fabrics targets always
link to a virtual nvme subsystem device.

In the case you have a multipath namespace, the driver will also create
"hidden" block devices for each controller path that it found. As an
exampe, if you have multipath nvme /sys/block/nvme0n1, there should be
a
/sys/block/nvme0c0n1, which should link to a physical device in sysfs
for pci.
 
For the multipath case, you obtain the hidden path devices for a given
name space like this:

NSID=1

ls -d /sys/block/nvme0n${NSID}/device/nvme*/nvme*c*n${NSID}
/sys/block/nvme0n1/device/nvme0/nvme0c0n1 
/sys/block/nvme0n1/device/nvme1/nvme0c1n1 
...
(/sys/block/nvme0n1/device is a symlink to the nvme-subsystem device,
and /sys/block/nvme0n1/device/nvme* are symlinks to the
respective fabrics controllers).

Because there are multiple symlinks involved, you can't use these
relationships in udev rules easily, as udev can only match attributes
of the device itself and its parents.
quoted
Secondly, I would like to know if there's a chance to have a
consistent
behavior where the paths would be the same on all systems, so that
user-space only has to deal with one naming scheme instead of two.
It
would be nice not to have to deal with exceptions in dracut and
udev.
I don't think there's much hope.
quoted
Lastly, in the case of virtual NVME device paths (which I suspect
can't
be avoided in multipath scenarios), could you suggest a reliable
way to
figure out which drivers are being used? Multipath existed before
NVME
so I suppose there's a way to do it already, maybe the NVME
subsystem
needs to be adjusted to do it the same way other subsystems (SCSI)
do
it?
For the fabrics case, deriving the necessary drivers for the initramfs
is non-trivial. You would need to look at the "transport" and "address"
sysfs attributes in the sysfs directory of the controller, e.g.
/sys/block/nvme0n1/device/nvme0, map these to FC ports or NICs, and
figure out the drivers for those.

The situation is roughly similar to iscsi, where there's also no easy
mapping from SCSI devices to drivers. For iSCSI, the by-path devices
are There won't be any by-path device links for NVMe multipath like for
SCSI though, because the path devices are hidden by the kernel and thus
no symlink targets would exist. It should be possible to create a
utility similar to udev's "path_id" builtin with support for NVMe
though.

Regards
Martin

Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help