Thread (47 messages) 47 messages, 5 authors, 2024-08-21

Re: [PATCH net-next v13 04/14] mm: page_frag: add '_va' suffix to page_frag API

From: Yunsheng Lin <hidden>
Date: 2024-08-21 12:30:54
Also in: bpf, intel-wired-lan, kvm, linux-kselftest, linux-mediatek, linux-mm, linux-nfs, linux-nvme, lkml, virtualization

On 2024/8/21 0:02, Alexander Duyck wrote:
On Tue, Aug 20, 2024 at 6:07 AM Yunsheng Lin [off-list ref] wrote:
quoted
On 2024/8/19 23:54, Alexander Duyck wrote:

...
quoted
quoted
quoted
quoted
"There are three types of API as proposed in this patchset instead of
two types of API:
1. page_frag_alloc_va() returns [va].
2. page_frag_alloc_pg() returns [page, offset].
3. page_frag_alloc() returns [va] & [page, offset].

You seemed to miss that we need a third naming for the type 3 API.
Do you see type 3 API as a valid API? if yes, what naming are you
suggesting for it? if no, why it is not a valid API?"
I didn't. I just don't see the point in pushing out the existing API
to support that. In reality 2 and 3 are redundant. You probably only
need 3. Like I mentioned earlier you can essentially just pass a
If the caller just expect [page, offset], do you expect the caller also
type 3 API, which return both [va] and [page, offset]?

I am not sure if I understand why you think 2 and 3 are redundant here?
If you think 2 and 3 are redundant here, aren't 1 and 3 also redundant
as the similar agrument?
The big difference is the need to return page and offset. Basically to
support returning page and offset you need to pass at least one value
as a pointer so you can store the return there.

The reason why 3 is just a redundant form of 2 is that you will
normally just be converting from a va to a page and offset so the va
should already be easily accessible.
I am assuming that by 'easily accessible', you meant the 'va' can be
calculated as below, right?

va = encoded_page_address(encoded_va) +
                (page_frag_cache_page_size(encoded_va) - remaining);

I guess it is easily accessible, but it is not without some overhead
to calculate the 'va' here.
It is just the encoded_page_address + offset that you have to
calculate anyway. So the only bit you actually have to do is 2
instructions, one to mask the encoded_va and then the addition of the
offset that you provided to the page. As it stands those instruction
can easily be slipped in while you are working on converting the va to
a page.
Well, with your suggestions against other optimizations like avoiding
a checking in fast patch and avoid calling virt_to_page(), the overhead
is kind of added up.

And I am really surprised by your above suggestion about deciding the
API for users according to the internal implementation detail here. As
the overhead of calculating 'va' is really depending on the layout of
'struct page_frag_cache' here, what if we change the implementation and
the overhead of calculating 'va' becomes bigger? Do we expect to change
the API for the callers when we change the internal implementation of
page_frag_cache?
quoted
quoted
quoted
quoted
page_frag via pointer to the function. With that you could also look
at just returning a virtual address as well if you insist on having
something that returns all of the above. No point in having 2 and 3 be
seperate functions.
Let's be more specific about what are your suggestion here: which way
is the prefer way to return the virtual address. It seems there are two
options:

1. Return the virtual address by function returning as below:
void *page_frag_alloc_bio(struct page_frag_cache *nc, struct bio_vec *bio);

2. Return the virtual address by double pointer as below:
int page_frag_alloc_bio(struct page_frag_cache *nc, struct bio_vec *bio,
                        void **va);
I was thinking more of option 1. Basically this is a superset of
page_frag_alloc_va that is also returning the page and offset via a
page frag. However instead of bio_vec I would be good with "struct
page_frag *" being the value passed to the function to play the role
of container. Basically the big difference between 1 and 2/3 if I am
not mistaken is the fact that for 1 you pass the size, whereas with
2/3 you are peeling off the page frag from the larger page frag cache
Let's be clear here: The callers just expecting [page, offset] also need
to call type 3 API, which return both [va] and [page, offset]? and it
is ok to ignore the overhead of calculating the 'va' for those kinds
of callers just because we don't want to do the renaming for a existing
API and can't come up with good naming for that?
quoted
after the fact via a commit type action.
Just be clear here, there is no commit type action for some subtype of
type 2/3 API.

For example, for type 2 API in this patchset, it has below subtypes:

subtype 1: it does not need a commit type action, it just return
           [page, offset] instead of page_frag_alloc_va() returning [va],
           and it does not return the allocated fragsz back to the caller
           as page_frag_alloc_va() does not too:
struct page *page_frag_alloc_pg(struct page_frag_cache *nc,
                                unsigned int *offset, unsigned int fragsz,
                                gfp_t gfp)

subtype 2: it does need a commit type action, and @fragsz is returned to
           the caller and caller used that to commit how much fragsz to
           commit.
struct page *page_frag_alloc_pg_prepare(struct page_frag_cache *nc,
                                        unsigned int *offset,
                                        unsigned int *fragsz, gfp_t gfp)

Do you see subtype 1 as valid API? If no, why?
Not really, it is just a wrapper for page_frag_alloc that is
converting the virtual address to a page and offset. They are the same
data and don't justify the need for two functions. It kind of explains
I am supposing you meant something like below:
struct page *page_frag_alloc_pg(struct page_frag_cache *nc,
				unsigned int *offset, unsigned int fragsz,
				gfp_t gfp)
{
	struct page *page;
	void *va;

	va = page_frag_alloc_va(nc, fragsz, gfp);
	if (!va)
		return NULL;

	page = virt_to_head_page(va);
	*offset = va - page_to_virt(page);

	return page;
}

If yes, I really think you are caring about maintainability too much by
trading off too much performance here by not only recalculating the offset
here, but also sometimes calling virt_to_head_page() unnecessarily.

If no, please share the pseudo code in your mind.
one of the complaints I had about this code. Supposedly it was
refactoring and combining several different callers into one, but what
it is actually doing is fracturing the code path into 3 different
variants based on little if any actual difference as it is doing
unnecessary optimization.
I am supposing the 3 different variants meant the below, right?
1. page_frag_alloc_va() returns [va].
2. page_frag_alloc_pg() returns [page, offset].
3. page_frag_alloc() returns [va] & [page, offset].

And there is others 3 different variants for prepare API too:
4. page_frag_alloc_va_prepare() returns [va].
5. page_frag_alloc_pg_prepare() returns [page, offset].
6. page_frag_alloc_prepare() returns [va] & [page, offset].

Side note: I just found the '4. page_frag_alloc_va_prepare()' API is
not used/called currently and can be removed in next revision for this
patchset.

It seems what you really want is 3 & 2 to be a wrapper for 1, and
5 & 6 to be a wrapper for 4?

If yes, too much performance is traded off here as my understanding.
Does't the introducing of __page_frag_cache_reload() already enable the
balance between performance and maintainability as much as possible in
patch 8?
quoted
If yes, do you also expect the caller to use "struct page_frag *" as the
container? If yes, what is the caller expected to do with the size field in
"struct page_frag *" from API perspective? Just ignore it?
It should be populated. You passed a fragsz, so you should populate
the output fragsz so you can get the truesize in the case of network
packets. The removal of the page_frag from the other callers is making
it much harder to review your code anyway. If we keep the page_frag
there it should reduce the amount of change needed when you replace
page_frag with the page_frag_cache.
I am not starting to use page_frag as the container yet, but the above
part is something that I am probably agreed with.
Honestly this is eating up too much of my time. As I said before this
patch set is too big and it is trying to squeeze in more than it
really should for a single patch set to be reviewable. Going forward
please split up the patch set as I had suggested before and address my
comments. Ideally you would have your first patch just be some
refactor and cleanup to get the "offset" pointer moving in the
direction you want. With that we can at least get half of this set
digested before we start chewing into all this refactor for the
replacement of page_frag with the page_frag_cache.
I don't really think breaking this patchset into more patchsets without
a newcase is helping to speed up the process here, it might slow down
the process instead, as the different idea about the refactoring and
new API naming is not going to disappear by breaking the patchset, and
the breaking may make the discussion harder without a bigger picture
and context.
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help