Re: [RFC v2 00/10] bdev: LBS devices support to coexist with buffer-heads
From: Luis Chamberlain <mcgrof@kernel.org>
Date: 2023-09-18 17:51:37
Also in:
linux-fsdevel, linux-patches, linux-xfs
On Mon, Sep 18, 2023 at 12:49:36PM +1000, Dave Chinner wrote:
On Sun, Sep 17, 2023 at 06:13:40PM -0700, Luis Chamberlain wrote:quoted
On Mon, Sep 18, 2023 at 10:59:07AM +1000, Dave Chinner wrote:quoted
On Mon, Sep 18, 2023 at 12:14:48AM +0100, Matthew Wilcox wrote:quoted
On Mon, Sep 18, 2023 at 08:38:37AM +1000, Dave Chinner wrote:quoted
On Fri, Sep 15, 2023 at 02:32:44PM -0700, Luis Chamberlain wrote:quoted
LBS devices. This in turn allows filesystems which support bs > 4k to be enabled on a 4k PAGE_SIZE world on LBS block devices. This alows LBS device then to take advantage of the recenlty posted work today to enable LBS support for filesystems [0].Why do we need LBS devices to support bs > ps in XFS?It's the other way round -- we need the support in the page cache to reject sub-block-size folios (which is in the other patches) before we can sensibly talk about enabling any filesystems on top of LBS devices. Even XFS, or for that matter ext2 which support 16k block sizes on CONFIG_PAGE_SIZE_16K (or 64K) kernels need that support first.Well, yes, I know that. But the statement above implies that we can't use bs > ps filesytems without LBS support on 4kB PAGE_SIZE systems. If it's meant to mean the exact opposite, then it is extremely poorly worded....Let's ignore the above statement of a second just to clarify the goal here. The patches posted by Pankaj enable bs > ps even on 4k sector drives.Yes. They also enable XFS to support bs > ps on devices up to 32kB sector sizes, too. All the sector size does is define the minimum filesystem block size that can be supported by the filesystem on that device and apart from that we just don't care what the sector size on the underlying device is.
Ah yes, but on a system with 4k page size, my point is that you still can't use a 32k sector size drive.
quoted
This patch series by definition is suggesting that an LBS device is one where the minimum sector size is > ps, in practice I'm talking about devices where the logical block size is > 4k, or where the physical block size can be > 4k.Which XFS with bs > ps just doesn't care about. As long as the logical sector size is a power of 2 between 512 bytes and 32kB, it will just work.quoted
There are two situations in the NVMe world where this can happen. One is where the LBA format used is > 4k, the other is where the npwg & awupf | nawupf is > 4k. The first makes the logicalFWIW, I have no idea what these acronyms actually mean....
Sorry, I've described them them, here now: https://kernelnewbies.org/KernelProjects/large-block-size#LBS_NVMe_support What matters though is in practice it is a combination of two values which today also allows the nvme driver to have a higher than 4k physical block size.
quoted
block size > 4k, the later allows the physical block size to be > 4k. The first *needs* an aops which groks > ps on the block device cache. The second let's you remain backward compatible with 4k sector size, but if you want to use a larger sector size you can too, but that also requires a block device cache which groks > ps. When using > ps for logical block size of physical block size is what I am suggesting we should call LBS devices.Sure. LBS means sector size > page size for the block device. That much has always been clear. But telling me that again doesn't explain what LBS support has to do with the filesystem implementation. mkfs.xfs doesn't require LBS support to make a new XFS filesystem with a 32kB sector size and 64kB filessytem block size. For the mounted filesystem that supports bs > ps, it also doesn't care about the device sector size is smaller than what mkfs told it to us. e.g. this is how run 4kB sector size filesystem testing on 512 byte sector devices today.... What I'm asking is why LBS support even mentions filesystems?
It's just that without this patchset you can mkfs an filesystem with larger bs > ps, but will only be able to mount them if the sector size matches the page size. This patchset allows them to be mounted and used where sector size > ps. The main issue there is just the block device cache aops and the current size limitaiton on set_blocksize().
It's not necessary for filesystems to support bs > ps for LBS to be implemented, and it's not necessary for LBS to be supported for filesytsems to implement bs > ps. Conflating them seems a recipe for confusion....
I see, so we should just limit the scope to enabling LBS devices on the block device cache?
quoted
quoted
iomap supports bufferheads as a transitional thing (e.g. for gfs2).But not for the block device cache.That's why I'm suggesting that you implement support for bufferheads through the existing iomap infrastructure instead of trying to dynamically switch aops structures....quoted
quoted
Hence I suspect that a better solution is to always use iomap and the same aops, but just switch from iomap page state to buffer heads in the bdev mappingNot sure this means, any chance I can trouble you to clarify a bit more?bdev_use_buggerheads() { /* * set the bufferhead bit atomically with invalidation emptying the * page cache to prevent repopulation races. */ filemap_invalidate_lock() invalidate_bdev() if (turn_on_bufferheads) set_bit(bdev->state, BDEV_USE_BUFFERHEADS); else clear_bit(bdev->state, BDEV_USE_BUFFERHEADS); filemap_invalidate_unlock() } bdev_iomap_begin() { ..... if (test_bit(bdev->state, BDEV_USE_BUFFERHEADS)) iomap->flags |= IOMAP_F_BUFFER_HEAD; } /* * If an indexing switch happened while processing the iomap, make * sure to get the iomap marked stale to force a new mapping to be * looked up. */ bdev_iomap_valid() { bool need_bhs = iomap->flags & IOMAP_F_BUFFER_HEAD; bool use_bhs = test_bit(bdev->state, BDEV_USE_BUGGERHEADS); return need_bhs == use_bhs; }
Will give this a shot, thanks! Luis