Thread (17 messages) 17 messages, 5 authors, 2016-08-16

Re: [PATCH v6 0/2] Block layer support ZAC/ZBC commands

From: Shaun Tancheff <hidden>
Date: 2016-08-10 03:58:45
Also in: linux-scsi, lkml

On Tue, Aug 9, 2016 at 3:09 AM, Damien Le Moal [off-list ref] wro=
te:
quoted
On Aug 9, 2016, at 15:47, Hannes Reinecke [off-list ref] wrote:
[trim]
quoted
quoted
Since disk type =3D=3D 0 for everything that isn't HM so I would prefer=
 the
quoted
quoted
sysfs 'zoned' file just report if the drive is HA or HM.
Okay. So let's put in the 'zoned' attribute the device type:
'host-managed', 'host-aware', or 'device managed'.
I hacked your patches and simply put a "0" or "1" in the sysfs zoned file=
.
Any drive that has ZBC/ZAC command support gets a "1", "0" for everything
else. This means that drive managed models are not exposed as zoned block
devices. For HM vs HA differentiation, an application can look at the
device type file since it is already present.

We could indeed set the "zoned" file to the device type, but HM drives an=
d
regular drives will both have "0" in it, so no differentiation possible.
The other choice could be the "zoned" bits defined by ZBC, but these
do not define a value for host managed drives, and the drive managed valu=
e
being not "0" could be confusing too. So I settled for a simple 0/1 boole=
an.

This seems good to me.
quoted
quoted
quoted
2) Add ioctls for zone management:
Report zones (get information from RB tree), reset zone (simple wrappe=
r
quoted
quoted
quoted
to ioctl for block discard), open zone, close zone and finish zone. Th=
at
quoted
quoted
quoted
will allow mkfs like tools to get zone information without having to p=
arse
quoted
quoted
quoted
thousands of sysfs files (and can also be integrated in libzbc block b=
ackend
quoted
quoted
quoted
driver for a unified interface with the direct SG_IO path for kernels =
without
quoted
quoted
quoted
the ZBC code enabled).
I can add finish zone ... but I really can't think of a use for it, mys=
elf.
quoted
quoted
Which is not the point. The standard defines this, so clearly someone
found it a reasonable addendum. So let's add this for completeness.
Agreed.
Done: I hacked Shaun ioctl code and added finish zone too. The
difference with Shaun initial code is that the ioctl are propagated down =
to
the driver (__blkdev_driver_ioctl -> sd_ioctl) so that there is no need f=
or
BIO request definition for the zone operations. So a lot less code added.
The purpose of the BIO flags is not to enable the ioctls so much as
the other way round. Creating BIO op's is to enable issuing ZBC
commands from device mapper targets and file systems without some
heinous ioctl hacks.
Making the resulting block layer interfaces available via ioctls is just a
reasonable way to exercise the code ... or that was my intent.
The ioctls do not mimic exactly the ZBC standard. For instance, there is =
no
reporting options for report zones, nor is the "all" bit supported for op=
en,
close or finish zone commands. But the information provided on zones is c=
omplete
and maps to the standard definitions.
For the reporting options I have planned to reuse the stream_id in
struct bio when that is formalized. There are certainly other places in
struct bio to stuff a few extra bits ...

As far as the all bit ... this is being handled by all the zone action
commands. Just pass a sector of ~0ul and it's handled in sd.c by
sd_setup_zone_action_cmnd().

Apologies as apparently my documentation here is lacking :-(
I also added a reset_wp ioctl for completeness, but this one simply calls
blkdev_issue_discard internally, so it is in fact equivalent to the BLKDI=
SCARD
ioctl call with a range exactly aligned on a zone.
I'm confused as my patch set includes a Reset WP (BLKRESETZONE) that
creates a REQ_OP_ZONE_RESET .. same as open and close. My
expectation being that BLKDISCARD doesn't really need yet another alias.

[trim]
Did that too. The blk_zone struct is now exactly 64B. I removed the per z=
one

Thanks .. being a cache line is harder to whinge about...
spinlock and replaced it with a flag so that zones can still be locked
independently using wait_on_bit_lock & friends (I added the functions
blk_zone_lock, blk_zone_trylock and blk_zone_unlock to do that). This per=
 zone
locking comes in handy to implement the ioctls code without stalling the =
command
queue for read, writes and discard on different zones.

I however kept the zone length in the structure. The reason for doing so =
is that
this allows supporting drives with non-constant zone sizes, albeit with a=
 more
limited interface since in such case the device chunk_sectors is not set =
(that
is how a user or application can detect that the zones are not constant s=
ize).
For these drives, the block layer may happily merge BIOs across zone boun=
daries
and the discard code will not split and align calls on the zones. But upp=
er layers
(an FS or a device mapper) can still do all this by themselves if they wa=
nt/can
support non-constant zone sizes.

The only exception is drives like the Seagate one with only the last zone=
 of a
different size. This case is handled exactly as if all zones are the same=
 size
simply because any operation on the last smaller zone will naturally alig=
n as
the checks of operation size against the drive capacity will do the right=
 things.
The ioctls work for all cases (drive with constant zone size or not). Thi=
s is again
to allow supporting eventual weird drives at application level. I integra=
ted all
these ioctl into libzbc block device backend driver and everything is fin=
e. Can't
tell the difference with direct-to-drive SG_IO accesses. But unlike these=
, the zone
ioctls keep the zone information RB-tree cache up to date.
quoted
I will be updating my patchset accordingly.
I need to cleanup my code and rebase on top of 4.8-rc1. Let me do this an=
d I will send
everything for review. If you have any comment on the above, please let m=
e know and
I will be happy to incorporate changes.

Best regards.


------------------------
Damien Le Moal, Ph.D.
Sr. Manager, System Software Group, HGST Research,
HGST, a Western Digital brand
Damien.LeMoal@hgst.com
(+81) 0466-98-3593 (ext. 513593)
1 kirihara-cho, Fujisawa,
Kanagawa, 252-0888 Japan
www.hgst.com

Western Digital Corporation (and its subsidiaries) E-mail Confidentiality=
 Notice & Disclaimer:
This e-mail and any files transmitted with it may contain confidential or=
 legally privileged information of WDC and/or its affiliates, and are inten=
ded solely for the use of the individual or entity to which they are addres=
sed. If you are not the intended recipient, any disclosure, copying, distri=
bution or any action taken or omitted to be taken in reliance on it, is pro=
hibited. If you have received this e-mail in error, please notify the sende=
r immediately and delete the e-mail in its entirety from your system.


--=20
Shaun Tancheff
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help