Thread (47 messages) 47 messages, 6 authors, 2010-09-29

RE: [RFC PATCH 2/2] macvtap: TX zero copy between guest and host kernel

From: Xin, Xiaohui <hidden>
Date: 2010-09-15 01:51:31
Also in: kvm, lkml

From: Shirley Ma [mailto:mashirle@us.ibm.com]
Sent: Tuesday, September 14, 2010 11:05 PM
To: Avi Kivity
Cc: David Miller; arnd@arndb.de; mst@redhat.com; Xin, Xiaohui; netdev@vger.kernel.org;
kvm@vger.kernel.org; linux-kernel@vger.kernel.org
Subject: Re: [RFC PATCH 2/2] macvtap: TX zero copy between guest and host kernel

On Tue, 2010-09-14 at 11:12 +0200, Avi Kivity wrote:
quoted
quoted
quoted
+            base = (unsigned long)from->iov_base + offset1;
+            size = ((base&  ~PAGE_MASK) + len + ~PAGE_MASK)>>
PAGE_SHIFT;
quoted
quoted
+            num_pages = get_user_pages_fast(base, size,
0,&page[i]);
quoted
quoted
+            if ((num_pages != size) ||
+                (num_pages>  MAX_SKB_FRAGS -
skb_shinfo(skb)->nr_frags))
quoted
quoted
+                    /* put_page is in skb free */
+                    return -EFAULT;
What keeps the user from writing to these pages in it's address
space
quoted
after the write call returns?

A write() return of success means:

      "I wrote what you gave to me"

not

      "I wrote what you gave to me, oh and BTW don't touch these
          pages for a while."

In fact "a while" isn't even defined in any way, as there is no way
for the write() invoker to know when the networking card is done
with
quoted
those pages.
That's what io_submit() is for.  Then io_getevents() tells you what
"a
while" actually was.
This macvtap zero copy uses iov buffers from vhost ring, which is
allocated from guest kernel. In host kernel, vhost calls macvtap
sendmsg. macvtap sendmsg calls get_user_pages_fast to pin these buffers'
pages for zero copy.

The patch is relying on how vhost handle these buffers. I need to look
at vhost code (qemu) first for addressing the questions here.

Thanks
Shirley
I think what David said is what we have thought before in mp device.
Since we are not sure the exact time the tx buffer was wrote though DMA operation.
But the deadline is when the tx buffer was freed. So we only notify the vhost stuff
about the write when tx buffer freed. But the deadline is maybe too late for performance.

Thanks
Xiaohui 
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help