Re: [PATCH 0/4] skb paged fragment destructors

From: Ian Campbell <hidden>
Date: 2011-12-09 13:47:09

On Wed, 2011-12-07 at 13:35 +0000, Ian Campbell wrote:

On Tue, 2011-12-06 at 13:24 +0000, Eric Dumazet wrote:

quoted

Le mardi 06 décembre 2011 à 11:57 +0000, Ian Campbell a écrit :

quoted

On Wed, 2011-11-09 at 15:01 +0000, Ian Campbell wrote:

quoted

      * split linear data allocation and shinfo allocation into two. I
        suspect this will have its own performance implications? On the
        positive side skb_shared_info could come from its own fixed size
        pool/cache which might have some benefits

I played with this to see how it would look. Illustrative patch below. 

I figure that lots of small frames is the interesting workload for a
change such as this but I don't know if iperf is necessarily the best
benchmark for measuring that.
Before changing things I got:
        iperf -c qarun -m -t 60 -u -b 10000M -l 64
        ------------------------------------------------------------
        Client connecting to qarun, UDP port 5001
        Sending 64 byte datagrams
        UDP buffer size:   224 KByte (default)
        ------------------------------------------------------------
        [  3] local 10.80.225.63 port 45857 connected with 10.80.224.22 port 5001
        [ ID] Interval       Transfer     Bandwidth
        [  3]  0.0-60.0 sec    844 MBytes    118 Mbits/sec
        [  3] Sent 13820376 datagrams
        [  3] Server Report:
        [  3]  0.0-60.0 sec    844 MBytes    118 Mbits/sec  0.005 ms    0/13820375 (0%)
        [  3]  0.0-60.0 sec  1 datagrams received out-of-order
whereas with the patch:
        # iperf -c qarun -m -t 60 -u -b 10000M -l 64
        ------------------------------------------------------------
        Client connecting to qarun, UDP port 5001
        Sending 64 byte datagrams
        UDP buffer size:   224 KByte (default)
        ------------------------------------------------------------
        [  3] local 10.80.225.63 port 42504 connected with 10.80.224.22 port 5001
        [ ID] Interval       Transfer     Bandwidth
        [  3]  0.0-60.0 sec    833 MBytes    116 Mbits/sec
        [  3] Sent 13645857 datagrams
        [  3] Server Report:
        [  3]  0.0-60.0 sec    833 MBytes    116 Mbits/sec  0.005 ms    0/13645856 (0%)
        [  3]  0.0-60.0 sec  1 datagrams received out-of-order

With 1200 byte datagrams I get basically identical throughput.

(nb: none of the skb destructor stuff was present in either case)

Sorry, but the real problem is that if skb producer and consumer are not
on same CPU, each skb will now hit SLUB slowpath three times instead of
two.

Some workloads are : One cpu fully handling IRQ from device, dispatching
skbs to consumers on other cpus.

So something like a multithreaded apache benchmark would be interesting?

quoted

Plus skb->truesize is wrong after your patch.
Not sure if cloning is correct either...

Me neither, the patch was just one which happened to run well enough to
run some numbers on. I'll be sure to double-check/correct that stuff if
I persist with the approach.

quoted

Anyway, do we _really_ need 16 frags per skb, I dont know....

MAX_SKB_FRAGS is:
        /* To allow 64K frame to be packed as single skb without frag_list. Since
         * GRO uses frags we allocate at least 16 regardless of page size.
         */
        #if (65536/PAGE_SIZE + 2) < 16
        #define MAX_SKB_FRAGS 16UL
        #else
        #define MAX_SKB_FRAGS (65536/PAGE_SIZE + 2)
        #endif
        
So I think actually it turns out to be 18 on systems with 4k pages. If
we reduced that to be 16 that would save 48 bytes on amd64 which pulls
the shinfo size down to 440 and then

        NET_SKB_PAD (64) + 1500 + 14 + 440 = 2018

which does fit in half a page.

Using 16 still means that a 64k frame fits precisely in frags on a 4096
page size system. I don't know what the extra 2 was for (it predates
git), perhaps just to allow for some slop due to misalignment in the
first page of data?

Tracked it down in TGLX's historic tree as:
http://git.kernel.org/?p=linux/kernel/git/tglx/history.git;a=commitdiff;h=80223d5186f73bf42a7e260c66c9cb9f7d8ec9cf

Having found the title I did a bit of searching around for discussion
but all I could find was other folks not having any idea where the + 2
comes from either...

http://kerneltrap.org/mailarchive/linux-netdev/2008/4/21/1529914 sort of
wonders if the + 2 might be including a head which crosses a page
boundary. I'm not sure it is sane/useful to account for that in
MAX_SKB_FRAGS. If we had a concept like MAX_SKB_PAGES then it would
perhaps make sense to have + 2 there, but AFAICT drivers etc are already
accounting for this appropriately by adding a further + 2 (or sometimes
+ 1) to MAX_SKB_FRAGS.

Ian.

quoted

This gives problems when/if skb must be linearized and we hit
PAGE_ALLOC_COSTLY_ORDER

Alternatively, we could use order-1 or order-2 pages on x86 to get
8192/16384 bytes frags. (fallback to order-0 pages in case of allocation
failures)

Or fallback to separate allocation of shinfo?

Ian.



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help