Thread (27 messages) 27 messages, 8 authors, 2025-10-22

Re: O_DIRECT vs BLK_FEAT_STABLE_WRITES, was Re: [PATCH] btrfs: never trust the bio from direct IO

From: Johannes Thumshirn <hidden>
Date: 2025-10-21 11:30:54
Also in: linux-btrfs, linux-fsdevel, linux-mm, linux-xfs

On 10/21/25 10:15 AM, Qu Wenruo wrote:
在 2025/10/21 18:18, Christoph Hellwig 写道:
quoted
On Tue, Oct 21, 2025 at 01:47:03PM +1030, Qu Wenruo wrote:
quoted
Off-topic a little, mind to share the performance drop with PI enabled on
XFS?
If the bandwith of the SSDs get close or exceeds the DRAM bandwith
buffered I/O can be 50% or less of the direct I/O performance.
In my case, the DRAM is way faster than the SSD (tens of GiB/s vs less
than 5GiB/s).
quoted
quoted
With this patch I'm able to enable direct IO for inodes with checksums.
I thought it would easily improve the performance, but the truth is, it's
not that different from buffered IO fall back.
That's because you still copy data.
Enabling the extra copy for direct IO only drops around 15~20%
performance, but that's on no csum case.

So far the calculation matches your estimation, but...
quoted
quoted
So I start wondering if it's the checksum itself causing the miserable
performance numbers.
Only indirectly by touching all the cachelines.  But once you copy you
touch them again.  Especially if not done in small chunks.
As long as I enable checksum verification, even with the bouncing page
direct IO, the result is not any better than buffered IO fallback, all
around 10% (not by 10%, at 10%) of the direct IO speed (no matter
bouncing or not).

Maybe I need to check if the proper hardware accelerated CRC32 is
utilized...

You could also hack in a NULL-csum for testing. Something that writes a 
fixed value every time. This would then rule out all the cost of the 
csum generation and only test the affected IO paths.
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help