Thread (136 messages) 136 messages, 11 authors, 2019-08-30

Re: [PATCH bpf-next v3 00/11] XDP unaligned chunk placement support

From: Bruce Richardson <hidden>
Date: 2019-07-26 08:41:57
Also in: bpf, intel-wired-lan

On Thu, Jul 25, 2019 at 10:30:51AM -0700, Jonathan Lemon wrote:

On 25 Jul 2019, at 8:56, Richardson, Bruce wrote:
quoted
quoted
-----Original Message-----
From: Jonathan Lemon [mailto:jonathan.lemon@gmail.com]
Sent: Thursday, July 25, 2019 4:39 PM
To: Laatz, Kevin <redacted>
Cc: netdev@vger.kernel.org; ast@kernel.org; daniel@iogearbox.net; Topel,
Bjorn [off-list ref]; Karlsson, Magnus
[off-list ref]; jakub.kicinski@netronome.com;
saeedm@mellanox.com; maximmi@mellanox.com; stephen@networkplumber.org;
Richardson, Bruce [off-list ref]; Loftus, Ciara
[off-list ref]; bpf@vger.kernel.org; intel-wired-
lan@lists.osuosl.org
Subject: Re: [PATCH bpf-next v3 00/11] XDP unaligned chunk placement
support



On 23 Jul 2019, at 22:10, Kevin Laatz wrote:
quoted
This patch set adds the ability to use unaligned chunks in the XDP umem.

Currently, all chunk addresses passed to the umem are masked to be
chunk size aligned (max is PAGE_SIZE). This limits where we can place
chunks within the umem as well as limiting the packet sizes that are
supported.
quoted
The changes in this patch set removes these restrictions, allowing XDP
to be more flexible in where it can place a chunk within a umem. By
relaxing where the chunks can be placed, it allows us to use an
arbitrary buffer size and place that wherever we have a free address
in the umem. These changes add the ability to support arbitrary frame
sizes up to 4k
(PAGE_SIZE) and make it easy to integrate with other existing
frameworks that have their own memory management systems, such as DPDK.
In DPDK, for example, there is already support for AF_XDP with zero-
copy.
quoted
However, with this patch set the integration will be much more seamless.
You can find the DPDK AF_XDP driver at:
https://git.dpdk.org/dpdk/tree/drivers/net/af_xdp

Since we are now dealing with arbitrary frame sizes, we need also need
to update how we pass around addresses. Currently, the addresses can
simply be masked to 2k to get back to the original address. This
becomes less trivial when using frame sizes that are not a 'power of
2' size. This patch set modifies the Rx/Tx descriptor format to use
the upper 16-bits of the addr field for an offset value, leaving the
lower 48-bits for the address (this leaves us with 256 Terabytes,
which should be enough!). We only need to use the upper 16-bits to store
the offset when running in unaligned mode.
quoted
Rather than adding the offset (headroom etc) to the address, we will
store it in the upper 16-bits of the address field. This way, we can
easily add the offset to the address where we need it, using some bit
manipulation and addition, and we can also easily get the original
address wherever we need it (for example in i40e_zca_fr-- ee) by
simply masking to get the lower 48-bits of the address field.
I wonder if it would be better to break backwards compatibility here and
say that a handle is going to change from [addr] to [base | offset], or
even [index | offset], where address = (index * chunk size) + offset, and
then use accessor macros to manipulate the queue entries.

This way, the XDP hotpath can adjust the handle with simple arithmetic,
bypassing the "if (unaligned)", check, as it changes the offset directly.

Using a chunk index instead of a base address is safer, otherwise it is
too easy to corrupt things.
--
The trouble with using a chunk index is that it assumes that all chunks are
contiguous, which is not always going to be the case. For example, for
userspace apps the easiest way to get memory that is IOVA/physically
contiguous is to use hugepages, but even then we still need to skip space
when crossing a 2MB barrier.

Specifically in this example case, with a 3k buffer size and 2MB hugepages,
we'd get 666 buffers on a single page, but then waste a few KB before
starting the 667th buffer at byte 0 on the second 2MB page.
This is the setup used in DPDK, for instance, when we allocate memory for
use in buffer pools.

Therefore, I think it's better to just have the kernel sanity checking the
request for safety and leave userspace the freedom to decide where in its
memory area it wants to place the buffers.
That makes sense, thanks.  My concern is that this makes it difficult for
the kernel to validate the buffer start address, but perhaps that isn't a
concern.
I don't think that's a problem right now, since the start address is still
an offset into the umem, and so must be between 0 and size (or size -
chunk_size, more specifically). Are there other checks that should be
performed on it, do you think?
I still believe it would simplify things if the format was [base | offset],
any opinion on that?
Can you perhaps clarify what you mean here. Are you suggesting globally
changing the address format to always use this, irrespective of chunk
alignment, or something else?

Regards,
/Bruce
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help