Re: [PATCH 1/6] fs: Add flag to file_system_type to indicate content is generated

From: Greg KH <gregkh@linuxfoundation.org>
Date: 2021-02-12 14:12:13
Also in: lkml

Possibly related (same subject, not in this thread)

2021-02-15 · Re: [PATCH 1/6] fs: Add flag to file_system_type to indicate content is generated · Luis Henriques <hidden>
2021-02-15 · Re: [PATCH 1/6] fs: Add flag to file_system_type to indicate content is generated · Amir Goldstein <amir73il@gmail.com>
2021-02-15 · Re: [PATCH 1/6] fs: Add flag to file_system_type to indicate content is generated · Luis Henriques <hidden>
2021-02-15 · Re: [PATCH 1/6] fs: Add flag to file_system_type to indicate content is generated · Luis Henriques <hidden>
2021-02-15 · Re: [PATCH 1/6] fs: Add flag to file_system_type to indicate content is generated · Greg KH <gregkh@linuxfoundation.org>

On Fri, Feb 12, 2021 at 12:41:48PM +0000, Luis Henriques wrote:

quoted hunk ↗ jump to hunk

Greg KH [off-list ref] writes:

quoted

On Fri, Feb 12, 2021 at 12:05:14PM +0000, Luis Henriques wrote:

quoted

Greg KH [off-list ref] writes:

quoted

On Fri, Feb 12, 2021 at 10:22:16AM +0200, Amir Goldstein wrote:

quoted

On Fri, Feb 12, 2021 at 9:49 AM Greg KH [off-list ref] wrote:

quoted

On Fri, Feb 12, 2021 at 12:44:00PM +0800, Nicolas Boichat wrote:

quoted

Filesystems such as procfs and sysfs generate their content at
runtime. This implies the file sizes do not usually match the
amount of data that can be read from the file, and that seeking
may not work as intended.

This will be useful to disallow copy_file_range with input files
from such filesystems.

Signed-off-by: Nicolas Boichat <redacted>
---
I first thought of adding a new field to struct file_operations,
but that doesn't quite scale as every single file creation
operation would need to be modified.

Even so, you missed a load of filesystems in the kernel with this patch
series, what makes the ones you did mark here different from the
"internal" filesystems that you did not?

This feels wrong, why is userspace suddenly breaking?  What changed in
the kernel that caused this?  Procfs has been around for a _very_ long
time :)

That would be because of (v5.3):

5dae222a5ff0 vfs: allow copy_file_range to copy across devices

The intention of this change (series) was to allow server side copy
for nfs and cifs via copy_file_range().
This is mostly work by Dave Chinner that I picked up following requests
from the NFS folks.

But the above change also includes this generic change:

-       /* this could be relaxed once a method supports cross-fs copies */
-       if (file_inode(file_in)->i_sb != file_inode(file_out)->i_sb)
-               return -EXDEV;
-

The change of behavior was documented in the commit message.
It was also documented in:

88e75e2c5 copy_file_range.2: Kernel v5.3 updates

I think our rationale for the generic change was:
"Why not? What could go wrong? (TM)"
I am not sure if any workload really gained something from this
kernel cross-fs CFR.

Why not put that check back?

quoted

In retrospect, I think it would have been safer to allow cross-fs CFR
only to the filesystems that implement ->{copy,remap}_file_range()...

Why not make this change?  That seems easier and should fix this for
everyone, right?

quoted

Our option now are:
- Restore the cross-fs restriction into generic_copy_file_range()

Yes.

Restoring this restriction will actually change the current cephfs CFR
behaviour.  Since that commit we have allowed doing remote copies between
different filesystems within the same ceph cluster.  See commit
6fd4e6348352 ("ceph: allow object copies across different filesystems in
the same cluster").

Although I'm not aware of any current users for this scenario, the
performance impact can actually be huge as it's the difference between
asking the OSDs for copying a file and doing a full read+write on the
client side.

Regression in performance is ok if it fixes a regression for things that
used to work just fine in the past :)

First rule, make it work.

Sure, I just wanted to point out that *maybe* there are other options than
simply reverting that commit :-)

Something like the patch below (completely untested!) should revert to the
old behaviour in filesystems that don't implement the CFR syscall.

Cheers,
-- 
Luis

diff --git a/fs/read_write.c b/fs/read_write.c
index 75f764b43418..bf5dccc43cc9 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c

@@ -1406,8 +1406,11 @@ static ssize_t do_copy_file_range(struct file *file_in, loff_t pos_in,
 						       file_out, pos_out,
 						       len, flags);
 
-	return generic_copy_file_range(file_in, pos_in, file_out, pos_out, len,
-				       flags);
+	if (file_inode(file_in)->i_sb != file_inode(file_out)->i_sb)
+		return -EXDEV;
+	else
+		generic_copy_file_range(file_in, pos_in, file_out, pos_out, len,
+					flags);
 }
 
 /*

That would make much more sense to me.

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help