Thread (35 messages) 35 messages, 5 authors, 2022-09-05

Re: [PATCH v12 13/13] dm: add power-of-2 target for zoned devices with non power-of-2 zone sizes

From: Mike Snitzer <hidden>
Date: 2022-09-02 21:07:19
Also in: dm-devel, linux-nvme, lkml

On Fri, Sep 02 2022 at  4:55P -0400,
Mike Snitzer [off-list ref] wrote:
On Tue, Aug 23 2022 at  8:18P -0400,
Pankaj Raghav [off-list ref] wrote:
quoted
Only zoned devices with power-of-2(po2) number of sectors per zone(zone
size) were supported in linux but now non power-of-2(npo2) zone sizes
support has been added to the block layer.

Filesystems such as F2FS and btrfs have support for zoned devices with
po2 zone size assumption. Before adding native support for npo2 zone
sizes, it was suggested to create a dm target for npo2 zone size device to
appear as a po2 zone size target so that file systems can initially
work without any explicit changes by using this target.

The design of this target is very simple: remap the device zone size to
the zone capacity and change the zone size to be the nearest power of 2
value.

For e.g., a device with a zone size/capacity of 3M will have an equivalent
target layout as follows:

Device layout :-
zone capacity = 3M
zone size = 3M

|--------------|-------------|
0             3M            6M

Target layout :-
zone capacity=3M
zone size = 4M

|--------------|---|--------------|---|
0             3M  4M             7M  8M

The area between target's zone capacity and zone size will be emulated
in the target.
The read IOs that fall in the emulated gap area will return 0 filled
bio and all the other IOs in that area will result in an error.
If a read IO span across the emulated area boundary, then the IOs are
split across them. All other IO operations that span across the emulated
area boundary will result in an error.

The target can be easily created as follows:
dmsetup create <label> --table '0 <size_sects> po2zone /dev/nvme<id>'

Note that the target does not support partial mapping of the underlying
device.

Signed-off-by: Pankaj Raghav <redacted>
Suggested-by: Johannes Thumshirn <redacted>
Suggested-by: Damien Le Moal <redacted>
Suggested-by: Hannes Reinecke <hare@suse.de>

This target needs more review from those who Suggested-by it.

And the header and docs needs to address:

1) why is a partial mapping of the underlying device disallowed?
2) why is it assumed all IO is read-only? (talk to me and others like
   we don't know the inherent limitations of this class of zoned hw)

On a code level:
1) are you certain you're properly failing all writes?
   - are writes allowed to the "zone capacity area" but _not_
     allowed to the "emulated zone area"? (if yes, _please document_). 
2) yes, you absolutely need to implement the .status target_type hook
   (for both STATUS and TABLE).
3) really not loving the nested return (of DM_MAPIO_SUBMITTED or
   DM_MAPIO_REMAPPED) from methods called from dm_po2z_map().  Would
   prefer to not have to do a depth-first search to see where and when
   dm_po2z_map() returns a DM_MAPIO_XXX unless there is a solid
   justification for it.  To me it just obfuscates the DM interface a
   bit too much. 

Otherwise, pretty clean code and nothing weird going on.

I look forward to seeing your next (final?) revision of this patchset.
Thinking further.. I'm left confused about just what the heck this
target is assuming.

E.g.: feels like its exposing a readonly end of the zone is very
bi-polar... yet no hint to upper layer it shouldn't write to that
read-only end (the "emulated zone").. but there has to be some zoned
magic assumed?  And I'm just naive?

Mike
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help