Re: [3.15-rc3] Bisected: xen-netback mangles packets between two guests on a bridge since merge of "TX grant mapping with SKBTX_DEV_ZEROCOPY instead of copy" series.
From: Sander Eikelenboom <hidden>
Date: 2014-05-06 17:33:52
Also in:
xen-devel
Tuesday, May 6, 2014, 7:10:27 PM, you wrote:
On 05/05/14 11:19, Sander Eikelenboom wrote:quoted
Hi Zoltan, This weekend i tried some more things, the summary: 1) It's a PITA to isolate your patches that went into 3.15 (to rule out any other changes) and apply them to 3.14.2, which is tested and worked ok. Could you put up a git tree somewhere and rebase your patch series on 3.14.2 for testing ?I've managed to repro the case in house, now I'll start to add more debug logging
Yippie :-)
quoted
2) Does the test suite you are using also has tests verifying that the content of packets isn't altered ?Not directly. But the applications running on top of them probably do so. Also, my tests are running on top of 3.10 kernel, where the netback changes are backported.
I would probably be a nice test that would be overlooked quite easily, although it the should be the primary concern .. :-) <big snip>
quoted
Ad 4a) Assumption that "An upstream guest shouldn't be able to send 18 slots": - xen-netfront does this slot check in "xennet_start_xmit": slots = DIV_ROUND_UP(offset + len, PAGE_SIZE) + xennet_count_skb_frag_slots(skb); if (unlikely(slots > MAX_SKB_FRAGS + 1)) { net_alert_ratelimited( "xennet: skb rides the rocket: %d slots\n", slots); goto drop; } - The "MAX_SKB_FRAGS + 1" was changed due to: http://www.gossamer-threads.com/lists/xen/devel/266980, but it doesn't seem to be the proper solution. - So your assumption doesn't hold, MAX_SKB_FRAGS==17, so 18 slots can come through. - On 3.15-rc4 i now started to see this warning getting triggered and packets dropped, i don't see this on 3.14.2: [ 118.526583] xen_netfront: xennet: skb rides the rocket: 19 slots | skb_shinfo(skb)->nr_frags: 3, len: 186, offset: 4070, skb->len: 62330, skb->data_len: 62144, skb->truesize: 63424, np->tx.sring->rsp_prod: 21434, np->tx.rsp_cons: 21434 DIV_ROUND_UP(offset + len, PAGE_SIZE): 2Now I get it: compound pages on frags can cause more slots in certain cases. Hm, I don't know how we should handle this on netfront side.
I don't know if something similar as netback does would be possible, do a check before dequeuing the skb and do the required slot detection in the capped pessimistic way (unless Paul comes up with something more clever) ?