Thread (19 messages) 19 messages, 3 authors, 2021-10-05

Re: [PATCH 0/5] btrfs-progs: use direct-IO for zoned device

From: Naohiro Aota <Naohiro.Aota@wdc.com>
Date: 2021-09-29 02:24:26

On Mon, Sep 27, 2021 at 11:51:39PM +0200, David Sterba wrote:
On Mon, Sep 27, 2021 at 01:15:49PM +0900, Naohiro Aota wrote:
quoted
As discussed in the Zoned Storage page [1],  the kernel page cache does not
guarantee that cached dirty pages will be flushed to a block device in
sequential sector order. Thus, we must use O_DIRECT for writing to a zoned
device to ensure the write ordering.

[1] https://zonedstorage.io/linux/overview/#zbd-support-restrictions

As a writng buffer is embedded in some other struct (e.g., "char data[]" in
struct extent_buffer), it is difficult to allocate the struct so that the
writng buffer is aligned.

This series introduces btrfs_{pread,pwrite} to wrap around pread/pwrite,
which allocates an aligned bounce buffer, copy the buffer contents, and
proceeds the IO. And, it now opens a zoned device with O_DIRECT.

Since the allocation and copying are costly, it is better to do them only
when necessary. But, it is cumbersome to call fcntl(F_GETFL) to determine
the file is opened with O_DIRECT or not every time doing an IO.

As zoned device forces to use zoned btrfs, I decided to use the zoned flag
to determine if it is direct-IO or not. This can cause a false-positive (to
use the bounce buffer when a file is *not* opened with O_DIRECT) in case of
emulated zoned mode on a non-zoned device or a regular file. Considering
the emulated zoned mode is mostly for debugging or testing, I believe this
is acceptable.

Patch 1 is a preparation not to set an emulated zone_size value when not
needed.

Patches 2 and 3 wraps pread/pwrite with newly introduced function
btrfs_pread/btrfs_pwrite.

Patches 4 deals with the zoned flag while reading the initial trees.

Patch 5 finally opens a zoned device with O_DIRECT.

Naohiro Aota (5):
  btrfs-progs: mkfs: do not set zone size on non-zoned mode
  btrfs-progs: introduce btrfs_pwrite wrapper for pwrite
  btrfs-progs: introduce btrfs_pread wrapper for pread
  btrfs-progs: temporally set zoned flag for initial tree reading
  btrfs-progs: use direct-IO for zoned device
I was doing some btrfs-convert changes and found that it crashed, rough
bisection points to this series. With the last patch applied, convert
fails with the following ASAN error:
It looks like eb->fs_info == NULL at this point. In case of
btrfs-convert, we can assume it is non-zoned because we do not support
the converting on a zoned device (we can't create ext*, reiserfs on a
zoned device anyway).

But, I also found a similar issue occurs with "mkfs.btrfs -f -d raid5
-m raid5 /dev/loop0 /dev/loop1 /dev/loop2 /dev/loop3" in
read_extent_from_disk(). Let me check which is better to ensure the
fs_info is set or to check if it's NULL.
...
Create initial btrfs filesystem
Create ext2 image file
AddressSanitizer:DEADLYSIGNAL
=================================================================
==18432==ERROR: AddressSanitizer: SEGV on unknown address (pc 0x000000496627 bp 0x7ffe5299e4d0 sp 0x7ffe5299e4b0 T0)
==18432==The signal is caused by a READ memory access.
==18432==Hint: this fault was caused by a dereference of a high value address (see register values below).  Dissassemble the provided pc to learn which register was used.
    #0 0x496627 in write_extent_to_disk kernel-shared/extent_io.c:815
    #1 0x470080 in write_and_map_eb kernel-shared/disk-io.c:525
    #2 0x411af9 in migrate_one_reserved_range convert/main.c:402
    #3 0x411fa5 in migrate_reserved_ranges convert/main.c:459
    #4 0x414088 in create_image convert/main.c:878
    #5 0x416d70 in do_convert convert/main.c:1269
    #6 0x41a294 in main convert/main.c:1993
    #7 0x7f7ef7c2753f in __libc_start_call_main (/lib64/libc.so.6+0x2d53f)
    #8 0x7f7ef7c275eb in __libc_start_main_alias_1 (/lib64/libc.so.6+0x2d5eb)
    #9 0x40ed04 in _start (.../btrfs-progs/btrfs-convert+0x40ed04)

kernel-shared/extent_io.c:815:

 811 int write_extent_to_disk(struct extent_buffer *eb)
 812 {
 813         int ret;
 814         ret = btrfs_pwrite(eb->fd, eb->data, eb->len, eb->dev_bytenr,
 815                            eb->fs_info->zoned);
 816         if (ret < 0)
 817                 goto out;
 818         if (ret != eb->len) {
 819                 ret = -EIO;
 820                 goto out;
 821         }
 822         ret = 0;
 823 out:
 824         return ret;
 825 }
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help