Thread (39 messages) 39 messages, 6 authors, 2023-06-23

Re: [PATCH net-next v3 01/18] net: Copy slab data for sendmsg(MSG_SPLICE_PAGES)

From: Paolo Abeni <pabeni@redhat.com>
Date: 2023-06-23 09:38:43
Also in: linux-mm, lkml

On Fri, 2023-06-23 at 10:06 +0100, David Howells wrote:
Paolo Abeni [off-list ref] wrote:
quoted
IMHO this function uses a bit too much labels and would be more easy to
read, e.g. moving the above chunk of code in conditional branch.
Maybe.  I was trying to put the fast path up at the top without the slow path
bits in it, but I can put the "insufficient_space" bit there.
I *think* you could move the insufficient_space in a separate helped,
that should achieve your goal with fewer labels and hopefully no
additional complexity.
quoted
Even without such change, I think the above 'goto try_again;'
introduces an unneeded conditional, as at this point we know 'fragsz <=
fsize'.
Good point.
quoted
quoted
+		cache->pfmemalloc = folio_is_pfmemalloc(spare);
+		if (cache->folio)
+			goto reload;
I think there is some problem with the above.

If cache->folio is != NULL, and cache->folio was not pfmemalloc-ed
while the spare one is, it looks like the wrong policy will be used.
And should be even worse if folio was pfmemalloc-ed while spare is not.

I think moving 'cache->pfmemalloc' initialization...
quoted
+	}
+
... here should fix the above.
Yeah.  We might have raced with someone else or been moved to another cpu and
there might now be a folio we can allocate from.
quoted
quoted
+	/* Reset page count bias and offset to start of new frag */
+	cache->pagecnt_bias = PAGE_FRAG_CACHE_MAX_SIZE + 1;
+	offset = folio_size(folio);
+	goto try_again;
What if fragsz > PAGE_SIZE, we are consistently unable to allocate an
high order page, but order-0, pfmemalloc-ed page allocation is
successful? It looks like this could become an unbounded loop?
It shouldn't.  It should go:

	try_again:
		if (fragsz > offset)
			goto insufficient_space;
	insufficient_space:
		/* See if we can refurbish the current folio. */
		...
I think the critical path is with pfmemalloc-ed pages:

		if (unlikely(cache->pfmemalloc)) {
			__folio_put(folio);
			goto get_new_folio;
		}

just before the following.
		fsize = folio_size(folio);
		if (unlikely(fragsz > fsize))
			goto frag_too_big;
	frag_too_big:
		...
		return NULL;

Though for safety's sake, it would make sense to put in a size check in the
case we fail to allocate a larger-order folio.
quoted
quoted
 		do {
 			struct page *page = pages[i++];
 			size_t part = min_t(size_t, PAGE_SIZE - off, len);
-
-			ret = -EIO;
-			if (WARN_ON_ONCE(!sendpage_ok(page)))
+			bool put = false;
+
+			if (PageSlab(page)) {
I'm a bit concerned from the above. If I read correctly, tcp 0-copy
Well, splice()-to-tcp will; MSG_ZEROCOPY is unaffected.
Ah right! I got lost in some 'if' branch.
quoted
will go through that for every page, even if the expected use-case is
always !PageSlub(page). compound_head() could be costly if the head
page is not hot on cache and I'm not sure if that could be the case for
tcp 0-copy. The bottom line is that I fear a possible regression here.
I can put the PageSlab() check inside the sendpage_ok() so the page flag is
only checked once.  
Perhaps I'm lost again, but AFAICS:

__PAGEFLAG(Slab, slab, PF_NO_TAIL)

// ...
#define __PAGEFLAG(uname, lname, policy)			\
	TESTPAGEFLAG(uname, lname, policy)			\
// ...

#define TESTPAGEFLAG(uname, lname, policy)				\
static __always_inline bool folio_test_##lname(struct folio *folio)	\
{ return test_bit(PG_##lname, folio_flags(folio, FOLIO_##policy));}	\
static __always_inline int Page##uname(struct page *page)               \
{ return test_bit(PG_##lname, &policy(page, 0)->flags); }
// ... 'policy' is PF_NO_TAIL here

#define PF_NO_TAIL(page, enforce) ({                                    \
                VM_BUG_ON_PGFLAGS(enforce && PageTail(page), page);	\
                PF_POISONED_CHECK(compound_head(page)); })

It looks at compound_head in the end ?!?
But PageSlab() doesn't check the headpage, only the page
it is given.  sendpage_ok() is more the problem as it also calls
page_count().  I could drop the check.
Once the head page is hot on cache due to the previous check, it should
be cheap?

Cheers,

Paolo
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help