Thread (52 messages) 52 messages, 11 authors, 2023-07-31

Re: [dpdk-dev] [RFC] mempool: implement index-based per core cache

From: Honnappa Nagarahalli <hidden>
Date: 2021-11-08 15:29:26

<snip>
quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
Current mempool per core cache implementation is based
on
pointer
quoted
quoted
quoted
quoted
For most architectures, each pointer consumes 64b
Replace
it
quoted
quoted
quoted
quoted
quoted
quoted
quoted
with
quoted
quoted
quoted
quoted
index-based implementation, where in each buffer is
addressed
by
quoted
quoted
quoted
quoted
(pool address + index)
I like Dharmik's suggestion very much. CPU cache is a
critical and limited resource.

DPDK has a tendency of using pointers where indexes could be
used
quoted
quoted
quoted
quoted
quoted
quoted
instead. I suppose pointers provide the additional
flexibility
of
quoted
quoted
quoted
quoted
quoted
quoted
mixing entries from different memory pools, e.g. multiple
mbuf
pools.
quoted
quoted
Agreed, thank you!
quoted
quoted
quoted
quoted
quoted
quoted
I don't think it is going to work:
On 64-bit systems difference between pool address and
it's
elem
quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
address could be bigger than 4GB.
Are you talking about a case where the memory pool size is
more
than 4GB?
quoted
That is one possible scenario.
That could be solved by making the index an element index
instead
quoted
quoted
of
quoted
quoted
a
quoted
quoted
pointer offset: address = (pool address + index * element
size).
quoted
quoted
quoted
quoted
quoted
Or instead of scaling the index with the element size, which
is
only
quoted
quoted
known at runtime, the index could be more efficiently scaled by
a
quoted
quoted
quoted
quoted
compile time constant such as RTE_MEMPOOL_ALIGN (=
RTE_CACHE_LINE_SIZE). With a cache line size of 64 byte, that
would
quoted
quoted
quoted
quoted
allow indexing into mempools up to 256 GB in size.
quoted
Looking at this snippet [1] from
rte_mempool_op_populate_helper(),
quoted
quoted
quoted
quoted
there is an ‘offset’ added to avoid objects to cross page
boundaries.
quoted
quoted
If my understanding is correct, using the index of element
instead
quoted
quoted
of a
quoted
quoted
pointer offset will pose a challenge for some of the corner
cases.
quoted
quoted
quoted
quoted
[1]
       for (i = 0; i < max_objs; i++) {
               /* avoid objects to cross page boundaries */
               if (check_obj_bounds(va + off, pg_sz,
total_elt_sz)
quoted
quoted
quoted
quoted
<
0) {
                       off += RTE_PTR_ALIGN_CEIL(va + off,
pg_sz) -
quoted
quoted
quoted
quoted
(va + off);
                       if (flags &
RTE_MEMPOOL_POPULATE_F_ALIGN_OBJ)
quoted
quoted
quoted
quoted
                               off += total_elt_sz -
                                       (((uintptr_t)(va + off -
1) %
quoted
quoted
quoted
quoted
                                               total_elt_sz) +
1);
quoted
quoted
quoted
quoted
               }
OK. Alternatively to scaling the index with a cache line size,
you
quoted
quoted
can scale it with sizeof(uintptr_t) to be able to address 32 or 16
GB
quoted
quoted
mempools on respectively 64 bit and 32 bit architectures. Both x86
and
quoted
quoted
ARM CPUs have instructions to access memory with an added offset
multiplied by 4 or 8. So that should be high performance.

Yes, agreed this can be done.
Cache line size can also be used when ‘MEMPOOL_F_NO_CACHE_ALIGN’
is not enabled.
On a side note, I wanted to better understand the need for having
the
quoted
quoted
'MEMPOOL_F_NO_CACHE_ALIGN' option.
The description of this field is misleading, and should be corrected.
The correct description would be: Don't need to align objs on cache
lines.
quoted
It is useful for mempools containing very small objects, to conserve
memory.
I think we can assume that mbuf pools are created with the
'MEMPOOL_F_NO_CACHE_ALIGN' flag set. With this we can use offset
calculated with cache line size as the unit.
You mean MEMPOOL_F_NO_CACHE_ALIGN flag not set. ;-)
Yes 😊
I agree. And since the flag is a hint only, it can be ignored if the mempool
library is scaling the index with the cache line size.
I do not think we should ignore the flag for reason you mention below.
However, a mempool may contain other objects than mbufs, and those objects
may be small, so ignoring the MEMPOOL_F_NO_CACHE_ALIGN flag may cost a
lot of memory for such mempools.
We could use different methods. If MEMPOOL_F_NO_CACHE_ALIGN is set, use the unit as 'sizeof(uintptr_t)', if not set use cache line size as the unit.
quoted
quoted
<snip>
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help