Re: 8% performance improved by change tap interact with kernel stack
From: Stephen Hemminger <stephen@networkplumber.org>
Date: 2014-01-28 16:58:34
Also in:
kvm
On Tue, 28 Jan 2014 12:33:25 +0200 "Michael S. Tsirkin" [off-list ref] wrote:
On Tue, Jan 28, 2014 at 06:19:02PM +0800, Qin Chuanyu wrote:quoted
On 2014/1/28 17:41, Michael S. Tsirkin wrote:quoted
quoted
quoted
I think it's okay - IIUC this way we are processing xmit directly instead of going through softirq. Was meaning to try this - I'm glad you are looking into this. Could you please check latency results?netperf UDP_RR 512 test model: VM->host->host modified before : 11108 modified after : 11480 3% gained by this patchNice. What about CPU utilization? It's trivially easy to speed up networking by burning up a lot of CPU so we must make sure it's not doing that. And I think we should see some tests with TCP as well, and try several message sizes.Yes, by burning up more CPU we could get better performance easily. So I have bond vhost thread and interrupt of nic on CPU1 while testing. modified before, the idle of CPU1 is 0%-1% while testing. and after modify, the idle of CPU1 is 2%-3% while testing TCP also could gain from this, but pps is less than UDP, so I think the improvement would be not so obviously.Still need to test this doesn't regress but overall looks convincing to me. Could you send a patch, accompanied by testing results for throughput latency and cpu utilization for tcp and udp with various message sizes? Thanks!
There are a couple potential problems with this. The primary one is that now you are violating the explicit assumptions about when netif_receive_skb() can be called and because of that it may break things all over the place. * * netif_receive_skb() is the main receive data processing function. * It always succeeds. The buffer may be dropped during processing * for congestion control or by the protocol layers. * * This function may only be called from softirq context and interrupts * should be enabled. At a minimum, softirq (BH) and preempt must be disabled. Another potential problem is that since a softirq is not used, the kernel stack maybe much larger. Maybe a better way would be implementing some form of NAPI in the TUN device?