Thread (14 messages) 14 messages, 6 authors, 2021-06-16

Re: [PATCH] nbd: provide a way for userspace processes to identify device backends

From: Ming Lei <hidden>
Date: 2021-05-18 09:24:34
Also in: lkml
Subsystem: block layer, network block device (nbd), the rest · Maintainers: Jens Axboe, Josef Bacik, Linus Torvalds

On Tue, May 18, 2021 at 01:22:19PM +0530, Prasanna Kalever wrote:
On Tue, May 18, 2021 at 6:00 AM Ming Lei [off-list ref] wrote:
quoted
Hello Prasanna,

On Thu, Apr 29, 2021 at 03:58:28PM +0530, Prasanna Kumar Kalever wrote:
quoted
Problem:
On reconfigure of device, there is no way to defend if the backend
storage is matching with the initial backend storage.

Say, if an initial connect request for backend "pool1/image1" got
mapped to /dev/nbd0 and the userspace process is terminated. A next
reconfigure request within NBD_ATTR_DEAD_CONN_TIMEOUT is allowed to
use /dev/nbd0 for a different backend "pool1/image2"

For example, an operation like below could be dangerous:
Hello Ming,
quoted
Can you explain a bit why it is dangerous?
Yes, sure. Please check the below comments inline,
quoted
quoted
$ sudo rbd-nbd map --try-netlink rbd-pool/ext4-image
/dev/nbd0
$ sudo blkid /dev/nbd0
/dev/nbd0: UUID="bfc444b4-64b1-418f-8b36-6e0d170cfc04" TYPE="ext4"
On Map the rbd-nbd attempting to send NBD_CMD_CONNECT, for backend
'rbd-pool/ext4-image'. Post which kernel will allocate a new device
say /dev/nbd0 for backend file rbd-pool/ext4-image (format:
<pool>/<backendfile>)
quoted
quoted
$ sudo pkill -9 rbd-nbd
Assume normally or abnormally the userspace process (rbd-nbd here) is
terminated, but then as per the settings the device /dev/nbd0 is not
returned immediately, the kernel will wait for the
NBD_ATTR_DEAD_CONN_TIMEOUT to expire.

At this point two things could be possible:
1. if there is a reconfigure request from userspace within the timeout
then the kernel might reassign the same device /dev/nbd0.
2. if the timeout has expired, then the device will be relieved.
quoted
quoted
$ sudo rbd-nbd attach --try-netlink --device /dev/nbd0 rbd-pool/xfs-image
/dev/nbd0
$ sudo blkid /dev/nbd0
/dev/nbd0: UUID="d29bf343-6570-4069-a9ea-2fa156ced908" TYPE="xfs"
On attach the rbd-nbd attempt to send NBD_CMD_RECONFIGURE, after which
the kernel will assign '--device /dev/nbd0' to specified backend.

But there is a chance that userspace processes might accidentally send
NBD_CMD_RECONFIGURE claiming for /dev/nbd0 for a different backend
(rbd-pool/xfs-image instead of original rbd-pool/ext4-image).
Currently, there is no mechanism to verify if the backend provided
later with attach(NBD_CMD_RECONFIGURE) is matching with the one
provided originally with map(NBD_CMD_CONNECT) before granting for a
attach or reconfigure.

For example in the above-explained scenario:
Assume EXT4 on rbd-pool/ext4-image was mounted, after attach (Note:
device /dev/nbd0 is reconfigured to a different backend file) XFS on
rbd-pool/xfs-image would get corrupted. If there was an application
using /dev/nbd0 directly (as a raw block volume), it wouldn't be happy
either.
OK, got it. If I understand correctly, what you need is to not allow
reconfigure if the nbd disk is mounted, right?
quoted
quoted
Solution:
Provide a way for userspace processes to keep some metadata to identify
between the device and the backend, so that when a reconfigure request is
made, we can compare and avoid such dangerous operations.

With this solution, as part of the initial connect request, backend
path can be stored in the sysfs per device config, so that on a reconfigure
request it's easy to check if the backend path matches with the initial
connect backend path.

Please note, ioctl interface to nbd will not have these changes, as there
won't be any reconfigure.
BTW, loop has similar issue, and patch of 'block: add a sequence number to disks'
is added for addressing this issue, what do you think of that generic
approach wrt. this nbd's issue? such as used the exposed sysfs sequence number
for addressing this issue?

https://lore.kernel.org/linux-block/YH81n34d2G3C4Re+@gardel-login/#r (local)
If I understand the changes and the background of the fix correctly, I
think with that fix author is trying to monotonically increase the seq
number and add it to the disk on every single device map/attach and
expose it through the sysfs, which will help the userspace processes
further to correlate events for particular and specific devices that
reuse the same loop device.

Coming back to my changes:
I think here with this fix, we are trying to solve a different
problem. The fix with this patch accepts a cookie or a backend string
(could be file-path or whatever id userspace choose to provide) from
userspace at the time of map and stores it in the sysfs
/sys/block/nbdX/backend path and persists it until unmap is issued on
the device (meaning that identity stays throughout the life cycle of
that device, no matter how many detach and attaches happen).
Your solution needs change from userspace side, so it isn't flexible.
If there
is a detach request in between (not unmap) then on the next attach
(reconfigure request to reuse the same device) the stored
cookie/UUID/backend-string will stand as a reference to verify if the
newly passed backend is matching with the actual backend passed at map
time to avoid any misconfigurations by accident and to safely proceed
with attach.
We can avoid reconfigure if the nbd disk is opened exclusively, such as
mount, please see if the following patch can solve your probelm:
diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
index 4ff71b579cfc..045532a68d1e 100644
--- a/drivers/block/nbd.c
+++ b/drivers/block/nbd.c
@@ -2063,6 +2063,11 @@ static int nbd_genl_reconfigure(struct sk_buff *skb, struct genl_info *info)
 		return -EINVAL;
 	}
 
+	/* avoid changing device under exclusive owner */
+	ret = bd_prepare_to_claim(nbd->disk->part0, nbd_genl_reconfigure);
+	if (ret)
+		goto out_no_abort_claim;
+
 	mutex_lock(&nbd->config_lock);
 	config = nbd->config;
 	if (!test_bit(NBD_RT_BOUND, &config->runtime_flags) ||
@@ -2141,6 +2146,8 @@ static int nbd_genl_reconfigure(struct sk_buff *skb, struct genl_info *info)
 		}
 	}
 out:
+	bd_abort_claiming(nbd->disk->part0, nbd_genl_reconfigure);
+out_no_abort_claim:
 	mutex_unlock(&nbd->config_lock);
 	nbd_config_put(nbd);
 	nbd_put(nbd);

Thanks,
Ming
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help