Thread (16 messages) 16 messages, 9 authors, 2021-02-18

Re: [PATCH v2] vfs: prevent copy_file_range to copy across devices

From: Steve French <smfrench@gmail.com>
Date: 2021-02-16 21:16:19
Also in: ceph-devel, linux-fsdevel, linux-nfs, lkml

Possibly related (same subject, not in this thread)

On Tue, Feb 16, 2021 at 1:40 PM Amir Goldstein [off-list ref] wrote:
On Tue, Feb 16, 2021 at 9:31 PM Steve French [off-list ref] wrote:
quoted
On Tue, Feb 16, 2021 at 1:29 PM Anna Schumaker
[off-list ref] wrote:
quoted
On Tue, Feb 16, 2021 at 2:22 PM Amir Goldstein [off-list ref] wrote:
quoted
On Tue, Feb 16, 2021 at 8:54 PM Luis Henriques [off-list ref] wrote:
quoted
Amir Goldstein [off-list ref] writes:
quoted
On Tue, Feb 16, 2021 at 6:41 PM Luis Henriques [off-list ref] wrote:
quoted
Amir Goldstein [off-list ref] writes:
quoted
quoted
Ugh.  And I guess overlayfs may have a similar problem.
Not exactly.
Generally speaking, overlayfs should call vfs_copy_file_range()
with the flags it got from layer above, so if called from nfsd it
will allow cross fs copy and when called from syscall it won't.

There are some corner cases where overlayfs could benefit from
COPY_FILE_SPLICE (e.g. copy from lower file to upper file), but
let's leave those for now. Just leave overlayfs code as is.
Got it, thanks for clarifying.
quoted
quoted
quoted
This is easy to solve with a flag COPY_FILE_SPLICE (or something) that
is internal to kernel users.

FWIW, you may want to look at the loop in ovl_copy_up_data()
for improvements to nfsd_copy_file_range().

We can move the check out to copy_file_range syscall:

        if (flags != 0)
                return -EINVAL;

Leave the fallback from all filesystems and check for the
COPY_FILE_SPLICE flag inside generic_copy_file_range().
Ok, the diff bellow is just to make sure I understood your suggestion.

The patch will also need to:

 - change nfs and overlayfs calls to vfs_copy_file_range() so that they
   use the new flag.

 - check flags in generic_copy_file_checks() to make sure only valid flags
   are used (COPY_FILE_SPLICE at the moment).

Also, where should this flag be defined?  include/uapi/linux/fs.h?
Grep for REMAP_FILE_
Same header file, same Documentation rst file.
quoted
Cheers,
--
Luis
diff --git a/fs/read_write.c b/fs/read_write.c
index 75f764b43418..341d315d2a96 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -1383,6 +1383,13 @@ ssize_t generic_copy_file_range(struct file *file_in, loff_t pos_in,
                                struct file *file_out, loff_t pos_out,
                                size_t len, unsigned int flags)
 {
+       if (!(flags & COPY_FILE_SPLICE)) {
+               if (!file_out->f_op->copy_file_range)
+                       return -EOPNOTSUPP;
+               else if (file_out->f_op->copy_file_range !=
+                        file_in->f_op->copy_file_range)
+                       return -EXDEV;
+       }
That looks strange, because you are duplicating the logic in
do_copy_file_range(). Maybe better:

if (WARN_ON_ONCE(flags & ~COPY_FILE_SPLICE))
        return -EINVAL;
if (flags & COPY_FILE_SPLICE)
       return do_splice_direct(file_in, &pos_in, file_out, &pos_out,
                                 len > MAX_RW_COUNT ? MAX_RW_COUNT : len, 0);
My initial reasoning for duplicating the logic in do_copy_file_range() was
to allow the generic_copy_file_range() callers to be left unmodified and
allow the filesystems to default to this implementation.

With this change, I guess that the calls to generic_copy_file_range() from
the different filesystems can be dropped, as in my initial patch, as they
will always get -EINVAL.  The other option would be to set the
COPY_FILE_SPLICE flag in those calls, but that would get us back to the
problem we're trying to solve.
I don't understand the problem.

What exactly is wrong with the code I suggested?
Why should any filesystem be changed?

Maybe I am missing something.
Ok, I have to do a full brain reboot and start all over.

Before that, I picked the code you suggested and tested it.  I've mounted
a cephfs filesystem and used xfs_io to execute a 'copy_range' command
using /sys/kernel/debug/sched_features as source.  The result was a
0-sized file in cephfs.  And the reason is thevfs_copy_file_range()
early exit in:

        if (len == 0)
                return 0;

'len' is set in generic_copy_file_checks().
Good point.. I guess we will need to do all the checks earlier in
generic_copy_file_checks() including the logic of:

        if (file_in->f_op->remap_file_range &&
            file_inode(file_in)->i_sb == file_inode(file_out)->i_sb)

quoted
This means that we're not solving the original problem anymore (probably
since v1 of this patch, haven't checked).

Also, re-reading Trond's emails, I read: "... also disallowing the copy
from, say, an XFS formatted partition to an ext4 partition".  Isn't that
*exactly* what we're trying to do here?  I.e. _prevent_ these copies from
happening so that tracefs files can't be CFR'ed?
We want to address the report which means calls coming from
copy_file_range() syscall.

Trond's use case is vfs_copy_file_range() coming from nfsd.
When he writes about copy from XFS to ext4, he means an
NFS client is issuing server side copy (on same or different NFS mounts)
and the NFS server is executing nfsd_copy_file_range() on a source
file that happens to be on XFS and destination happens to be on ext4.
NFS also supports a server-to-server copy where the destination server
mounts the source server and reads the data to be copied. Please don't
break that either :)
As long as the copy is via nfsd_copy_file_range() and not from the syscall
it should not regress.
quoted
This is a case we will eventually need to support for cifs (SMB3) as well.
samba already does server side copy very well without needing any support
from the kernel.

nfsd also doesn't *need* to use vfs_copy_file_range() it can use kernel APIs
like the loop in ovl_copy_up_data(). But it does, so we should not regress it.

samba/nfsd can try to use copy_file_range() and it will work if the
source/target
fs support it. Otherwise, the server can perfectly well do the copy via other
available interfaces, just like userspace copy tools.
I was thinking about cifsd ("ksmbd") the kernel server from
Namjae/Sergey etc. which is making excellent progress.

-- 
Thanks,

Steve
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help