Thread (11 messages) 11 messages, 6 authors, 2021-08-06

Re: Canvassing for network filesystem write size vs page size

From: Trond Myklebust <hidden>
Date: 2021-08-05 17:43:25
Also in: ceph-devel, linux-fsdevel, linux-mm, linux-nfs, lkml

On Thu, 2021-08-05 at 10:27 -0700, Linus Torvalds wrote:
On Thu, Aug 5, 2021 at 9:36 AM David Howells [off-list ref]
wrote:
quoted
Some network filesystems, however, currently keep track of which
byte ranges
are modified within a dirty page (AFS does; NFS seems to also) and
only write
out the modified data.
NFS definitely does. I haven't used NFS in two decades, but I worked
on some of the code (read: I made nfs use the page cache both for
reading and writing) back in my Transmeta days, because NFSv2 was the
default filesystem setup back then.

See fs/nfs/write.c, although I have to admit that I don't recognize
that code any more.

It's fairly important to be able to do streaming writes without
having
to read the old contents for some loads. And read-modify-write cycles
are death for performance, so you really want to coalesce writes
until
you have the whole page.

That said, I suspect it's also *very* filesystem-specific, to the
point where it might not be worth trying to do in some generic
manner.

In particular, NFS had things like interesting credential issues, so
if you have multiple concurrent writers that used different 'struct
file *' to write to the file, you can't just mix the writes. You have
to sync the writes from one writer before you start the writes for
the
next one, because one might succeed and the other not.

So you can't just treat it as some random "page cache with dirty byte
extents". You really have to be careful about credentials, timeouts,
etc, and the pending writes have to keep a fair amount of state
around.

At least that was the case two decades ago.

[ goes off and looks. See "nfs_write_begin()" and friends in
fs/nfs/file.c for some of the examples of these things, althjough it
looks like the code is less aggressive about avoding the
read-modify-write case than I thought I remembered, and only does it
for write-only opens ]
All correct, however there is also the issue that even if we have done
a read-modify-write, we can't always extend the write to cover the
entire page.

If you look at nfs_can_extend_write(), you'll note that we don't extend
the page data if the file is range locked, if the attributes have not
been revalidated, or if the page cache contents are suspected to be
invalid for some other reason.

-- 
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@hammerspace.com

Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help