Re: [PATCH v12 13/13] dm: add power-of-2 target for zoned devices with non power-of-2 zone sizes
From: Mike Snitzer <hidden>
Date: 2022-09-02 21:07:19
Also in:
dm-devel, linux-nvme, lkml
On Fri, Sep 02 2022 at 4:55P -0400, Mike Snitzer [off-list ref] wrote:
On Tue, Aug 23 2022 at 8:18P -0400, Pankaj Raghav [off-list ref] wrote:quoted
Only zoned devices with power-of-2(po2) number of sectors per zone(zone size) were supported in linux but now non power-of-2(npo2) zone sizes support has been added to the block layer. Filesystems such as F2FS and btrfs have support for zoned devices with po2 zone size assumption. Before adding native support for npo2 zone sizes, it was suggested to create a dm target for npo2 zone size device to appear as a po2 zone size target so that file systems can initially work without any explicit changes by using this target. The design of this target is very simple: remap the device zone size to the zone capacity and change the zone size to be the nearest power of 2 value. For e.g., a device with a zone size/capacity of 3M will have an equivalent target layout as follows: Device layout :- zone capacity = 3M zone size = 3M |--------------|-------------| 0 3M 6M Target layout :- zone capacity=3M zone size = 4M |--------------|---|--------------|---| 0 3M 4M 7M 8M The area between target's zone capacity and zone size will be emulated in the target. The read IOs that fall in the emulated gap area will return 0 filled bio and all the other IOs in that area will result in an error. If a read IO span across the emulated area boundary, then the IOs are split across them. All other IO operations that span across the emulated area boundary will result in an error. The target can be easily created as follows: dmsetup create <label> --table '0 <size_sects> po2zone /dev/nvme<id>' Note that the target does not support partial mapping of the underlying device. Signed-off-by: Pankaj Raghav <redacted> Suggested-by: Johannes Thumshirn <redacted> Suggested-by: Damien Le Moal <redacted> Suggested-by: Hannes Reinecke <hare@suse.de>This target needs more review from those who Suggested-by it. And the header and docs needs to address: 1) why is a partial mapping of the underlying device disallowed? 2) why is it assumed all IO is read-only? (talk to me and others like we don't know the inherent limitations of this class of zoned hw) On a code level: 1) are you certain you're properly failing all writes? - are writes allowed to the "zone capacity area" but _not_ allowed to the "emulated zone area"? (if yes, _please document_). 2) yes, you absolutely need to implement the .status target_type hook (for both STATUS and TABLE). 3) really not loving the nested return (of DM_MAPIO_SUBMITTED or DM_MAPIO_REMAPPED) from methods called from dm_po2z_map(). Would prefer to not have to do a depth-first search to see where and when dm_po2z_map() returns a DM_MAPIO_XXX unless there is a solid justification for it. To me it just obfuscates the DM interface a bit too much. Otherwise, pretty clean code and nothing weird going on. I look forward to seeing your next (final?) revision of this patchset.
Thinking further.. I'm left confused about just what the heck this target is assuming. E.g.: feels like its exposing a readonly end of the zone is very bi-polar... yet no hint to upper layer it shouldn't write to that read-only end (the "emulated zone").. but there has to be some zoned magic assumed? And I'm just naive? Mike