Thread (155 messages) 155 messages, 17 authors, 2017-07-21

Re: [RFC 0/8] mbuf: structure reorganization

From: Bruce Richardson <hidden>
Date: 2017-02-21 15:18:07

On Tue, Feb 21, 2017 at 04:04:40PM +0100, Olivier MATZ wrote:
Hi Morten,

On Tue, 21 Feb 2017 15:20:23 +0100, Morten Brørup
[off-list ref] wrote:
quoted
Comments at the end.

quoted
-----Original Message-----
From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Olivier Matz
Sent: Thursday, February 16, 2017 5:14 PM
To: Bruce Richardson
Cc: Ananyev, Konstantin; dev@dpdk.org
Subject: Re: [dpdk-dev] [RFC 0/8] mbuf: structure reorganization

On Thu, 16 Feb 2017 15:46:19 +0000, Bruce Richardson
[off-list ref] wrote:  
quoted
On Thu, Feb 16, 2017 at 02:48:07PM +0100, Olivier Matz wrote:  
quoted
Hi Konstantin,

Thanks for the feedback.
Comments inline.


On Mon, 6 Feb 2017 18:41:27 +0000, "Ananyev, Konstantin"
[off-list ref] wrote:  
quoted
Hi Olivier,
Looks good in general, some comments from me below.
Thanks
Konstantin
 
quoted
The main changes are:
- reorder structure to increase vector performance on some
non-ia platforms.
- add a 64bits timestamp field in the 1st cache line  
Wonder why it deserves to be in first cache line?
How it differs from seqn below (pure SW stuff right now).  
In case the timestamp is set from a NIC value, it is set in the
Rx path. So that's why I think it deserve to be located in the
1st cache line.

As you said, the seqn is a pure sw stuff right: it is set in a
lib, not in a PMD rx path.
 
quoted
quoted
- m->next, m->nb_segs, and m->refcnt are always initialized
for mbufs in the pool, avoiding the need of setting
m->next  
(located  
quoted
quoted
quoted
quoted
in the 2nd cache line) in the Rx path for mono-segment
packets.
- change port and nb_segs to 16 bits  
Not that I am completely against it, but changing nb_segs to
16 bits seems like an overkill to me.
I think we can keep and extra 8bits for something more useful
in future.  
In my case, I use the m->next field to chain more than 256
segments for L4 socket buffers. It also updates nb_seg that can
overflow. It's not a big issue since at the end, nb_seg is
decremented for each segment. On the other hand, if I enable
some sanity checks on mbufs, it complains because the number of
segments is not equal to nb_seg.

There is also another use case with fragmentation as discussed
recently: http://dpdk.org/dev/patchwork/patch/19819/

Of course, dealing with a long mbuf list is not that efficient,
but the application can maintain another structure to
accelerate the access to the middle/end of the list.

Finally, we have other ideas to get additional 8 bits if
required  
in  
quoted
quoted
the future, so I don't think it's really a problem.

 
quoted
 
quoted
- move seqn in the 2nd cache line

Things discussed but not done in the patchset:
- move refcnt and nb_segs to the 2nd cache line: many
drivers sets them in the Rx path, so it could introduce a
performance regression, or  
I wonder can refcnt only be moved into the 2-nd cacheline?
As I understand thanks to other change (from above)
m->refcnt  
will  
quoted
quoted
quoted
already be initialized, so RX code don't need to touch it.
Though yes, it still would require changes in all PMDs.  
Yes, I agree, some fields could be moved in the 2nd cache line
once all PMDs stop to write them in RX path. I propose to issue
some guidelines to PMD maintainers at the same time the
patchset is pushed. Then we can consider changing it in a
future version, in case we need more room in the 1st mbuf cache
line. 
If we are changing things, we should really do all that now,
rather than storing up future breaks to mbuf. Worst case, we
should plan for it immediately after the release where we make
these changes. Have  
two  
quoted
releases that break mbuf immediately after each other - and
flagged  
as  
quoted
such, but keep it stable thereafter. I don't like having technical
debt on mbuf just after we supposedly "fix" it.  
I think there is no need to do this change now. And I don't feel
good with the idea of having a patchset that updates all the PMDs
to remove the access to a field because it moved to the 2nd cache
line (especially thinking about vector PMDs).

That's why I think the plan could be:
- push an updated version of this patchset quickly
- advertise to PMD maintainers "you don't need to set the m->next,
  m->refcnt, and m->nb_segs in the RX path, please update your
drivers"
- later, if we need more room in the 1st cache line of the mbuf, we
  can move refcnt and nb_seg, probably without impacting the
  performance.


Olivier  
I suppose you mean that PMDs don't need to /initialize/ m->next,
m->refcnt and m->nb_segs.

Forgive my ignorance, and this is wild speculation, but: Would a PMD
not need to set m->next and m->nb_segs if it receives a jumbogram
larger than an mbuf packet buffer? And if this is a realistic use
case, these fields actually do belong in the 1st cache line. PMD
developers please chime in.
Nothing to add to Bruce's answer :)
quoted
And I tend to agree with Bruce about making all these mbuf changes in
one go, rather than postponing some of them to later. Especially
because the postponement also closes and reopens the whole discussion
and decision process! (Not initializing a few fields in a PMD cannot
require a lot of work by the PMD developers. Moving the fields to the
2nd cache line will in the worst case degrade the performance of the
non-updated PMDs.)

A two step process makes good sense for the developers of DPDK, but
both steps should be taken within the same release, so they are
transparent to the users of DPDK.
I don't think this is doable, knowing the submission dead line is in
less than 2 weeks. On my side, honestly, I don't want to dive in the
code of into all PMDs. I feel this would be more risky than letting
the PMD maintainers update their own PMD code.
I, sadly, have to agree here. I think undertaking rework of all PMDs is
a huge job, that probably needs to be shared among the PMD authors.

Regards,
/Bruce
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help