Re: [PATCH RFC v8 02/11] vhost: use batched get_vq_desc version
From: "Michael S. Tsirkin" <mst@redhat.com>
Date: 2020-07-20 09:27:51
Also in:
kvm, lkml, virtualization
Subsystem:
the rest, virtio host (vhost) · Maintainers:
Linus Torvalds, "Michael S. Tsirkin", Jason Wang
On Thu, Jul 16, 2020 at 07:16:27PM +0200, Eugenio Perez Martin wrote:
On Fri, Jul 10, 2020 at 7:58 AM Michael S. Tsirkin [off-list ref] wrote:quoted
On Fri, Jul 10, 2020 at 07:39:26AM +0200, Eugenio Perez Martin wrote:quoted
quoted
quoted
How about playing with the batch size? Make it a mod parameter instead of the hard coded 64, and measure for all values 1 to 64 ...Right, according to the test result, 64 seems to be too aggressive in the case of TX.Got it, thanks both!In particular I wonder whether with batch size 1 we get same performance as without batching (would indicate 64 is too aggressive) or not (would indicate one of the code changes affects performance in an unexpected way). -- MSTHi! Varying batch_size as drivers/vhost/net.c:VHOST_NET_BATCH,
sorry this is not what I meant. I mean something like this:
diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 0b509be8d7b1..b94680e5721d 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c@@ -1279,6 +1279,10 @@ static void handle_rx_net(struct vhost_work *work) handle_rx(net); } +MODULE_PARM_DESC(batch_num, "Number of batched descriptors. (offset from 64)"); +module_param(batch_num, int, 0644); +static int batch_num = 0; + static int vhost_net_open(struct inode *inode, struct file *f) { struct vhost_net *n;
@@ -1333,7 +1337,7 @@ static int vhost_net_open(struct inode *inode, struct file *f) vhost_net_buf_init(&n->vqs[i].rxq); } vhost_dev_init(dev, vqs, VHOST_NET_VQ_MAX, - UIO_MAXIOV + VHOST_NET_BATCH, + UIO_MAXIOV + VHOST_NET_BATCH + batch_num, VHOST_NET_PKT_WEIGHT, VHOST_NET_WEIGHT, true, NULL);
then you can try tweaking batching and playing with mod parameter without recompiling. VHOST_NET_BATCH affects lots of other things.
and testing the pps as previous mail says. This means that we have either only vhost_net batching (in base testing, like previously to apply this patch) or both batching sizes the same. I've checked that vhost process (and pktgen) goes 100% cpu also. For tx: Batching decrements always the performance, in all cases. Not sure why bufapi made things better the last time. Batching makes improvements until 64 bufs, I see increments of pps but like 1%. For rx: Batching always improves performance. It seems that if we batch little, bufapi decreases performance, but beyond 64, bufapi is much better. The bufapi version keeps improving until I set a batching of 1024. So I guess it is super good to have a bunch of buffers to receive. Since with this test I cannot disable event_idx or things like that, what would be the next step for testing? Thanks! -- Results: # Buf size: 1,16,32,64,128,256,512 # Tx # === # Base 2293304.308,3396057.769,3540860.615,3636056.077,3332950.846,3694276.154,3689820 # Batch 2286723.857,3307191.643,3400346.571,3452527.786,3460766.857,3431042.5,3440722.286 # Batch + Bufapi 2257970.769,3151268.385,3260150.538,3379383.846,3424028.846,3433384.308,3385635.231,3406554.538 # Rx # == # pktgen results (pps) 1223275,1668868,1728794,1769261,1808574,1837252,1846436 1456924,1797901,1831234,1868746,1877508,1931598,1936402 1368923,1719716,1794373,1865170,1884803,1916021,1975160 # Testpmd pps results 1222698.143,1670604,1731040.6,1769218,1811206,1839308.75,1848478.75 1450140.5,1799985.75,1834089.75,1871290,1880005.5,1934147.25,1939034 1370621,1721858,1796287.75,1866618.5,1885466.5,1918670.75,1976173.5,1988760.75,1978316 pktgen was run again for rx with 1024 and 2048 buf size, giving 1988760.75 and 1978316 pps. Testpmd goes the same way.
Don't really understand what does this data mean. Which number of descs is batched for each run? -- MST