Re: [PATCH] block: don't set GD_NEED_PART_SCAN if scan partition failed

From: Jan Kara <jack@suse.cz>
Date: 2023-03-23 10:52:56
Also in: linux-block, lkml

On Thu 23-03-23 00:08:51, Ming Lei wrote:

On Wed, Mar 22, 2023 at 02:07:09PM +0100, Jan Kara wrote:

quoted

On Wed 22-03-23 19:34:30, Ming Lei wrote:

quoted

On Wed, Mar 22, 2023 at 10:47:07AM +0100, Jan Kara wrote:

quoted

On Wed 22-03-23 15:58:35, Ming Lei wrote:

quoted

On Wed, Mar 22, 2023 at 11:59:26AM +0800, Yu Kuai wrote:

quoted

From: Yu Kuai <redacted>

Currently if disk_scan_partitions() failed, GD_NEED_PART_SCAN will still
set, and partition scan will be proceed again when blkdev_get_by_dev()
is called. However, this will cause a problem that re-assemble partitioned
raid device will creat partition for underlying disk.

Test procedure:

mdadm -CR /dev/md0 -l 1 -n 2 /dev/sda /dev/sdb -e 1.0
sgdisk -n 0:0:+100MiB /dev/md0
blockdev --rereadpt /dev/sda
blockdev --rereadpt /dev/sdb
mdadm -S /dev/md0
mdadm -A /dev/md0 /dev/sda /dev/sdb

Test result: underlying disk partition and raid partition can be
observed at the same time

Note that this can still happen in come corner cases that
GD_NEED_PART_SCAN can be set for underlying disk while re-assemble raid
device.

Fixes: e5cfefa97bcc ("block: fix scan partition for exclusively open device again")
Signed-off-by: Yu Kuai <redacted>

The issue still can't be avoided completely, such as, after rebooting,
/dev/sda1 & /dev/md0p1 can be observed at the same time. And this one
should be underlying partitions scanned before re-assembling raid, I
guess it may not be easy to avoid.

So this was always happening (before my patches, after my patches, and now
after Yu's patches) and kernel does not have enough information to know
that sda will become part of md0 device in the future. But mdadm actually
deals with this as far as I remember and deletes partitions for all devices
it is assembling the array from (and quick tracing experiment I did
supports this).

I am testing on Fedora 37, so mdadm v4.2 doesn't delete underlying
partitions before re-assemble.

Strange, I'm on openSUSE Leap 15.4 and mdadm v4.1 deletes these partitions
(at least I can see mdadm do BLKPG_DEL_PARTITION ioctls). And checking
mdadm sources I can see calls to remove_partitions() from start_array()
function in Assemble.c so I'm not sure why this is not working for you...

I added dump_stack() in delete_partition() for partition 1, not observe
stack trace during booting.

quoted

Also given mdadm or related userspace has to change for avoiding
to scan underlying partitions, just wondering why not let userspace
to tell kernel not do it explicitly?

Well, those userspace changes are long deployed, now you would introduce
new API that needs to proliferate again. Not very nice. Also how would that
exactly work? I mean once mdadm has underlying device open, the current
logic makes sure we do not create partitions anymore. But there's no way
how mdadm could possibly prevent creation of partitions for devices it
doesn't know about yet so it still has to delete existing partitions...

I meant if mdadm has to change to delete existed partitions, why not add
one ioctl to disable partition scan for this disk when deleting
partitions/re-assemble, and re-enable scan after stopping array.

But looks it isn't so, since you mentioned that remove_partitions is
supposed to be called before starting array, however I didn't observe this
behavior.

Yeah, not sure what's happening on your system.

I am worrying if the current approach may cause regression, one concern is
that ioctl(BLKRRPART) needs exclusive open now, such as:

1) mount /dev/vdb1 /mnt

2) ioctl(BLKRRPART) may fail after removing /dev/vdb3

Well, but we always had some variant of:

        if (disk->open_partitions)
                return -EBUSY;

in disk_scan_partitions(). So as long as any partition on the disk is used,
EBUSY is the correct return value from BLKRRPART.

								Honza
-- 
Jan Kara [off-list ref]
SUSE Labs, CR

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help