Thread (10 messages) 10 messages, 2 authors, 2023-03-17

Re: [PATCH 1/2] nfsd: don't replace page in rq_pages if it's a continuation of last page

From: Jeff Layton <jlayton@kernel.org>
Date: 2023-03-17 15:00:13

On Fri, 2023-03-17 at 14:16 +0000, Chuck Lever III wrote:
quoted
On Mar 17, 2023, at 9:06 AM, Jeff Layton [off-list ref] wrote:

On Fri, 2023-03-17 at 06:56 -0400, Jeff Layton wrote:
quoted
The splice read calls nfsd_splice_actor to put the pages containing file
data into the svc_rqst->rq_pages array. It's possible however to get a
splice result that only has a partial page at the end, if (e.g.) the
filesystem hands back a short read that doesn't cover the whole page.

nfsd_splice_actor will plop the partial page into its rq_pages array and
return. Then later, when nfsd_splice_actor is called again, the
remainder of the page may end up being filled out. At this point,
nfsd_splice_actor will put the page into the array _again_ corrupting
the reply. If this is done enough times, rq_next_page will overrun the
array and corrupt the trailing fields -- the rq_respages and
rq_next_page pointers themselves.

If we've already added the page to the array in the last pass, don't add
it to the array a second time when dealing with a splice continuation.
This was originally handled properly in nfsd_splice_actor, but commit
91e23b1c3982 removed the check for it, and started universally replacing
pages.

Fixes: 91e23b1c3982 ("NFSD: Clean up nfsd_splice_actor()")
Reported-by: Dario Lesca <redacted>
Tested-by: David Critch <redacted>
Link: https://bugzilla.redhat.com/show_bug.cgi?id=2150630
Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
fs/nfsd/vfs.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
index 502e1b7742db..3709ef57d96e 100644
--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -941,8 +941,11 @@ nfsd_splice_actor(struct pipe_inode_info *pipe, struct pipe_buffer *buf,
	struct page *last_page;

	last_page = page + (offset + sd->len - 1) / PAGE_SIZE;
-	for (page += offset / PAGE_SIZE; page <= last_page; page++)
-		svc_rqst_replace_page(rqstp, page);
+	for (page += offset / PAGE_SIZE; page <= last_page; page++) {
+		/* Only replace page if we haven't already done so */
Note that I think that this was probably the real rationale for the pp[-
1] check that 91e23b1c3982 removed. Given that, maybe we should flesh
this comment out a bit more for posterity?

		/*
		 * When we're splicing from a pipe, it's possible that
		 * we'll get an incomplete page that may be updated on
		 * a later call. Only splice it into rq_pages once.
		 */
The "real" bug here is that the API contract for pipe splicing
isn't well defined, so I agree that it's very likely the pp[-1]
check was because a splice can call the actor repeatedly for the
same page. No one could remember why that check was there.
The whole splice API is a minefield.
To be clear, if the passed-in page matches the current page in
the rqst, we're "extending the current page" rather than avoiding
replacement... maybe:

	/*
	 * Skip page replacement when extending the contents
	 * of the current page.
	 */
Sure, sounds good.
In the patch description, would you mention that this case
arises if the READ request is not page-aligned?
Does it though? I'm not sure that page alignment has that much to do
with it. I imagine you can hit this even with aligned I/Os.

My guess is the bigger issue is when your storage is doing sub-page-size
I/Os under the hood. We end up filling up part of a page from storage
and the kernel submits what it has to the pipe and then the next bit
comes in and the page is updated for the next actor call.
If you resend this patch, please Cc: viro@ . Thanks for chasing
this down!
Will do.
quoted
quoted
+		if (page != *(rqstp->rq_next_page - 1))
+			svc_rqst_replace_page(rqstp, page);
+	}
	if (rqstp->rq_res.page_len == 0)	// first call
		rqstp->rq_res.page_base = offset % PAGE_SIZE;
	rqstp->rq_res.page_len += sd->len;
-- 
Jeff Layton [off-list ref]
--
Chuck Lever
-- 
Jeff Layton [off-list ref]
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help