Thread (37 messages) 37 messages, 6 authors, 2014-05-16

Re: [3.15-rc3] Bisected: xen-netback mangles packets between two guests on a bridge since merge of "TX grant mapping with SKBTX_DEV_ZEROCOPY instead of copy" series.

From: Zoltan Kiss <hidden>
Date: 2014-05-09 21:03:02

Hi,

Sorry for the long silence on this issue, I was busy trying to figure 
out what went wrong. Fun facts:

- commenting out that _pskb_pull_tail from tx_submit which 
unconditionally pulls up the linear area to 128 bytes seems to solve the 
problem
- I could repro the problem only when the sending guest had a 64 bit 
kernel, but then even with 3.2. On the other hand, with 32 bit sending 
guest it works fine. More exactly I think it boils down to the actual 
config, I used XenServer Dom0 config files, see them here:
https://github.com/xenserver/linux-3.x.pg/blob/master/master/kernel-configuration
- with 64 bit Debian 7 kernel as sender it also works, so I guess it's 
not about 32/64 bit, but something in the config
- the receiving guest, where wget ran, doesn't matter.
- the "more than MAX_SKB_FRAGS slots" thing was a red herring. A typical 
skb layout (on the sender's xenvif_start_xmit) which gets corrupted:
linear area: 66 bytes
0. frag: 52 bytes
1. frag: 1200 bytes
- so I guess the problem is when that pull_tail pulls the whole first 
frag into the linear area
- a corrupt packet on the receiver side looks like the following:
   - linear buffer: 128 bytes, content is OK
   - the content of the frag area is shifted back 4096 bytes in the
TCP stream. So instead of the Nth byte it starts with the (N-4096)th byte
   - the length is the same as on the sender side, I've checked by 
looking at the IP id fields
   - otherwise the stream content looks ok (I used a continuously 
incrementing pattern)
   - the next packet starts at the right place
- the pulling itself doesn't cause the corruption, I've printed out the 
first frag after that, and it still looks OK
- ftrace_printk("%*ph") seems to have problems when the pointer points 
to a grant mapped page. I have the impression that it tries to 
dereference it when I read the trace buffer, at which point the mapping 
and the content is long gone.

I'll continue to look into this next week

Zoli
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help