Thread (14 messages) 14 messages, 3 authors, 2015-10-20

Re: [PATCH] ixgbe: prefetch packet headers in vector PMD receive function

From: Ananyev, Konstantin <hidden>
Date: 2015-10-19 18:57:17

-----Original Message-----
From: Zoltan Kiss [mailto:zoltan.kiss@linaro.org]
Sent: Monday, October 19, 2015 5:30 PM
To: Ananyev, Konstantin; Richardson, Bruce; dev@dpdk.org
Subject: Re: [PATCH] ixgbe: prefetch packet headers in vector PMD receive function



On 15/10/15 16:43, Ananyev, Konstantin wrote:
quoted
quoted
-----Original Message-----
From: Zoltan Kiss [mailto:zoltan.kiss@linaro.org]
Sent: Thursday, October 15, 2015 11:33 AM
To: Ananyev, Konstantin; Richardson, Bruce; dev@dpdk.org
Subject: Re: [PATCH] ixgbe: prefetch packet headers in vector PMD receive function



On 15/10/15 00:23, Ananyev, Konstantin wrote:
quoted
quoted
-----Original Message-----
From: Zoltan Kiss [mailto:zoltan.kiss@linaro.org]
Sent: Wednesday, October 14, 2015 5:11 PM
To: Ananyev, Konstantin; Richardson, Bruce; dev@dpdk.org
Subject: Re: [PATCH] ixgbe: prefetch packet headers in vector PMD receive function



On 28/09/15 00:19, Ananyev, Konstantin wrote:
quoted
quoted
-----Original Message-----
From: Zoltan Kiss [mailto:zoltan.kiss@linaro.org]
Sent: Friday, September 25, 2015 7:29 PM
To: Richardson, Bruce; dev@dpdk.org
Cc: Ananyev, Konstantin
Subject: Re: [PATCH] ixgbe: prefetch packet headers in vector PMD receive function

On 07/09/15 07:41, Richardson, Bruce wrote:
quoted
quoted
-----Original Message-----
From: Zoltan Kiss [mailto:zoltan.kiss@linaro.org]
Sent: Monday, September 7, 2015 3:15 PM
To: Richardson, Bruce; dev@dpdk.org
Cc: Ananyev, Konstantin
Subject: Re: [PATCH] ixgbe: prefetch packet headers in vector PMD receive
function



On 07/09/15 13:57, Richardson, Bruce wrote:
quoted
quoted
-----Original Message-----
From: Zoltan Kiss [mailto:zoltan.kiss@linaro.org]
Sent: Monday, September 7, 2015 1:26 PM
To: dev@dpdk.org
Cc: Ananyev, Konstantin; Richardson, Bruce
Subject: Re: [PATCH] ixgbe: prefetch packet headers in vector PMD
receive function

Hi,

I just realized I've missed the "[PATCH]" tag from the subject. Did
anyone had time to review this?
Hi Zoltan,

the big thing that concerns me with this is the addition of new
instructions for each packet in the fast path. Ideally, this
prefetching would be better handled in the application itself, as for
some apps, e.g. those using pipelining, the core doing the RX from the
NIC may not touch the packet data at all, and the prefetches will
instead cause a performance slowdown.
quoted
Is it possible to get the same performance increase - or something
close to it - by making changes in OVS?
OVS already does a prefetch when it's processing the previous packet, but
apparently it's not early enough. At least for my test scenario, where I'm
forwarding UDP packets with the least possible overhead. I guess in tests
where OVS does more complex processing it should be fine.
I'll try to move the prefetch earlier in OVS codebase, but I'm not sure if
it'll help.
I would suggest trying to prefetch more than one packet ahead. Prefetching 4 or
8 ahead might work better, depending on the processing being done.
I've moved the prefetch earlier, and it seems to work:

https://patchwork.ozlabs.org/patch/519017/

However it raises the question: should we remove header prefetch from
all the other drivers, and the CONFIG_RTE_PMD_PACKET_PREFETCH option?
My vote would be for that.
Konstantin
After some further thinking I would rather support the
rte_packet_prefetch() macro (prefetch the header in the driver, and
configure it through CONFIG_RTE_PMD_PACKET_PREFETCH)

- the prefetch can happen earlier, so if an application needs the header
right away, this is the fastest
- if the application doesn't need header prefetch, it can turn it off
compile time. Same as if we wouldn't have this option.
- if the application has mixed needs (sometimes it needs the header
right away, sometimes it doesn't), it can turn it off and do what it
needs. Same as if we wouldn't have this option.

A harder decision would be whether it should be turned on or off by
default. Currently it's on, and half of the receive functions don't use it.
Yep,  it is sort of a mess right now.
Another question if we'd like to keep it and standardise it:
at what moment to prefetch: as soon as we realize that HW is done with that buffer,
or as late inside rx_burst() as possible?
I suppose there is no clear answer for that.
I think if the application needs the driver start the prefetching, it
does it because it's already too late when rte_eth_rx_burst() returns.
So I think it's best to do it as soon as we learn that the HW is done.
Probably, but it might be situations when it would be more plausible to do it later too.
Could you elaborate?
If the application wants prefetch to start earlier than could be done by
itself (right after rte_eth_rx_burst() returns), earlier is better. So
it will have the best chances to have the data in cache.
If it would need the data later, then it could do the prefetch by itself.
quoted
Again it might depend on particular HW and your memory access pattern.
quoted
quoted
That's why my thought was to just get rid of it.
Though if it would be implemented in some standardized way and disabled by default -
that's probably ok too.
BTW, couldn't that be implemented just via rte_ethdev rx callbacks mechanism?
Then we can have the code one place and don't need compile option at all -
could be ebabled/disabled dynamically on a per nic basis.
Or would it be too high overhead for that?
I think if we go that way, it's better to pass an option to
rte_eth_rx_burst() and tell if you want the header prefetched by the
driver. That would be the most flexible.
That would mean ABI breakage for rx_burst()...
Probably then better a new parameter in rx_conf or something.
Though it still means that each PMD has to implement it on its own.
That would be the case in any way, as we are are talking about
prefetching in the receive function.
quoted
And again there might be PMDs that would just ignore that parameter.
If the PMD has a good reason to do that (e.g. prefetch has undesirable
effects), I think it's fine.
quoted
While for callback it would be all in one and known place.
Who and when would call that callback? If the application after
rte_eth_rx_burst() returned, it wouldn't have too much use, it could
just call prefetch by itself.
I am talking about the callbacks called by rx_burst() itself:

static inline uint16_t
rte_eth_rx_burst(uint8_t port_id, uint16_t queue_id,
                 struct rte_mbuf **rx_pkts, const uint16_t nb_pkts)
{
        struct rte_eth_dev *dev;

        dev = &rte_eth_devices[port_id];

        int16_t nb_rx = (*dev->rx_pkt_burst)(dev->data->rx_queues[queue_id],
                        rx_pkts, nb_pkts);

#ifdef RTE_ETHDEV_RXTX_CALLBACKS
        struct rte_eth_rxtx_callback *cb = dev->post_rx_burst_cbs[queue_id];

        if (unlikely(cb != NULL)) {
                do {
                        nb_rx = cb->fn.rx(port_id, queue_id, rx_pkts, nb_rx,
                                                nb_pkts, cb->param);
                        cb = cb->next;
                } while (cb != NULL);
#endif

        return nb_rx;
}

Konstantin
quoted
But again, I didn't measure it, so not sure what will be an impact of the callback itself.
Might be it not worth it.
Konstantin
quoted
quoted
Konstantin


quoted
quoted
quoted
quoted
/Bruce
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help