Re: [PATCH v2] vfs: prevent copy_file_range to copy across devices

From: Luis Henriques <hidden>
Date: 2021-02-16 12:03:29
Also in: ceph-devel, linux-cifs, linux-nfs, lkml

"gregkh@linuxfoundation.org" [off-list ref] writes:

On Tue, Feb 16, 2021 at 11:17:34AM +0000, Luis Henriques wrote:

quoted

Amir Goldstein [off-list ref] writes:

quoted

On Mon, Feb 15, 2021 at 8:57 PM Trond Myklebust [off-list ref] wrote:

quoted

On Mon, 2021-02-15 at 19:24 +0200, Amir Goldstein wrote:

quoted

On Mon, Feb 15, 2021 at 6:53 PM Trond Myklebust <
trondmy@hammerspace.com> wrote:

quoted

On Mon, 2021-02-15 at 18:34 +0200, Amir Goldstein wrote:

quoted

On Mon, Feb 15, 2021 at 5:42 PM Luis Henriques <
lhenriques@suse.de>
wrote:

quoted

Nicolas Boichat reported an issue when trying to use the
copy_file_range
syscall on a tracefs file.  It failed silently because the file
content is
generated on-the-fly (reporting a size of zero) and
copy_file_range
needs
to know in advance how much data is present.

This commit restores the cross-fs restrictions that existed
prior
to
5dae222a5ff0 ("vfs: allow copy_file_range to copy across
devices")
and
removes generic_copy_file_range() calls from ceph, cifs, fuse,
and
nfs.

Fixes: 5dae222a5ff0 ("vfs: allow copy_file_range to copy across
devices")
Link:
https://lore.kernel.org/linux-fsdevel/20210212044405.4120619-1-drinkcat@chromium.org/ (local)
Cc: Nicolas Boichat <redacted>
Signed-off-by: Luis Henriques <redacted>

Code looks ok.
You may add:

Reviewed-by: Amir Goldstein <amir73il@gmail.com>

I agree with Trond that the first paragraph of the commit message
could
be improved.
The purpose of this change is to fix the change of behavior that
caused the regression.

Before v5.3, behavior was -EXDEV and userspace could fallback to
read.
After v5.3, behavior is zero size copy.

It does not matter so much what makes sense for CFR to do in this
case (generic cross-fs copy).  What matters is that nobody asked
for
this change and that it caused problems.

No. I'm saying that this patch should be NACKed unless there is a
real
explanation for why we give crap about this tracefs corner case and
why
it can't be fixed.

There are plenty of reasons why copy offload across filesystems
makes
sense, and particularly when you're doing NAS. Clone just doesn't
cut
it when it comes to disaster recovery (whereas backup to a
different
storage unit does). If the client has to do the copy, then you're
effectively doubling the load on the server, and you're adding
potentially unnecessary network traffic (or at the very least you
are
doubling that traffic).

I don't understand the use case you are describing.

Which filesystem types are you talking about for source and target
of copy_file_range()?

To be clear, the original change was done to support NFS/CIFS server-
side
copy and those should not be affected by this change.

That is incorrect:

ssize_t nfsd_copy_file_range(struct file *src, u64 src_pos, struct file
*dst,
 u64 dst_pos, u64 count)
{

 /*
 * Limit copy to 4MB to prevent indefinitely blocking an nfsd
 * thread and client rpc slot. The choice of 4MB is somewhat
 * arbitrary. We might instead base this on r/wsize, or make it
 * tunable, or use a time instead of a byte limit, or implement
 * asynchronous copy. In theory a client could also recognize a
 * limit like this and pipeline multiple COPY requests.
 */
 count = min_t(u64, count, 1 << 22);
 return vfs_copy_file_range(src, src_pos, dst, dst_pos, count, 0);
}

You are now explicitly changing the behaviour of knfsd when the source
and destination filesystem differ.

For one thing, you are disallowing the NFSv4.2 copy offload use case of
copying from a local filesystem to a remote NFS server. However you are
also disallowing the copy from, say, an XFS formatted partition to an
ext4 partition.

Got it.

Ugh.  And I guess overlayfs may have a similar problem.

quoted

This is easy to solve with a flag COPY_FILE_SPLICE (or something) that
is internal to kernel users.

FWIW, you may want to look at the loop in ovl_copy_up_data()
for improvements to nfsd_copy_file_range().

We can move the check out to copy_file_range syscall:

        if (flags != 0)
                return -EINVAL;

Leave the fallback from all filesystems and check for the
COPY_FILE_SPLICE flag inside generic_copy_file_range().

Ok, the diff bellow is just to make sure I understood your suggestion.

The patch will also need to:

 - change nfs and overlayfs calls to vfs_copy_file_range() so that they
   use the new flag.

 - check flags in generic_copy_file_checks() to make sure only valid flags
   are used (COPY_FILE_SPLICE at the moment).

Also, where should this flag be defined?  include/uapi/linux/fs.h?

Why would userspace want/need this flag?

In fact, my question sort of implied yours :-)

What I wanted to know was whether we would like to allow userspace to
_explicitly_ revert to the current behaviour (i.e. use the flag to allow
cross-fs copies) or to continue to return -EINVAL to userspace if flags
are != 0 (in which case this check would need to move to the syscall
definition).

Cheers,
-- 
Luis

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help