RE: [PATCH net v2 2/2] vhost_net: fix high cpu load when sendmsg fails

From: wangyunjian <hidden>
Date: 2020-12-17 02:39:53

-----Original Message-----
From: Michael S. Tsirkin [mailto:mst@redhat.com]
Sent: Wednesday, December 16, 2020 5:23 PM
To: wangyunjian <redacted>
Cc: netdev@vger.kernel.org; jasowang@redhat.com;
willemdebruijn.kernel@gmail.com; virtualization@lists.linux-foundation.org;
Lilijun (Jerry) [off-list ref]; chenchanghu
[off-list ref]; xudingke [off-list ref]; huangbin (J)
[off-list ref]
Subject: Re: [PATCH net v2 2/2] vhost_net: fix high cpu load when sendmsg fails

On Wed, Dec 16, 2020 at 04:20:37PM +0800, wangyunjian wrote:

quoted

From: Yunjian Wang <redacted>

Currently we break the loop and wake up the vhost_worker when sendmsg
fails. When the worker wakes up again, we'll meet the same error. This
will cause high CPU load. To fix this issue, we can skip this
description by ignoring the error. When we exceeds sndbuf, the return
value of sendmsg is -EAGAIN. In the case we don't skip the description
and don't drop packet.

Question: with this patch, what happens if sendmsg is interrupted by a signal?

The descriptors are consumed as normal. However, the packet is discarded.
Could you explain the specific scenario?

quoted

Signed-off-by: Yunjian Wang <redacted>
---
 drivers/vhost/net.c | 21 +++++++++------------
 1 file changed, 9 insertions(+), 12 deletions(-)

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c index

c8784dfafdd7..3d33f3183abe 100644

--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c

@@ -827,16 +827,13 @@ static void handle_tx_copy(struct vhost_net *net,

struct socket *sock)

quoted

 				msg.msg_flags &= ~MSG_MORE;
 		}

-		/* TODO: Check specific error and bomb out unless ENOBUFS? */
 		err = sock->ops->sendmsg(sock, &msg, len);
-		if (unlikely(err < 0)) {
+		if (unlikely(err == -EAGAIN)) {
 			vhost_discard_vq_desc(vq, 1);
 			vhost_net_enable_vq(net, vq);
 			break;
-		}
-		if (err != len)
-			pr_debug("Truncated TX packet: len %d != %zd\n",
-				 err, len);
+		} else if (unlikely(err != len))
+			vq_err(vq, "Fail to sending packets err : %d, len : %zd\n", err,
+len);
 done:
 		vq->heads[nvq->done_idx].id = cpu_to_vhost32(vq, head);
 		vq->heads[nvq->done_idx].len = 0;

@@ -922,7 +919,6 @@ static void handle_tx_zerocopy(struct vhost_net

*net, struct socket *sock)

quoted

 			msg.msg_flags &= ~MSG_MORE;
 		}

-		/* TODO: Check specific error and bomb out unless ENOBUFS? */
 		err = sock->ops->sendmsg(sock, &msg, len);
 		if (unlikely(err < 0)) {
 			if (zcopy_used) {

@@ -931,13 +927,14 @@ static void handle_tx_zerocopy(struct vhost_net

*net, struct socket *sock)

quoted

 				nvq->upend_idx = ((unsigned)nvq->upend_idx - 1)
 					% UIO_MAXIOV;
 			}
-			vhost_discard_vq_desc(vq, 1);
-			vhost_net_enable_vq(net, vq);
-			break;
+			if (err == -EAGAIN) {
+				vhost_discard_vq_desc(vq, 1);
+				vhost_net_enable_vq(net, vq);
+				break;
+			}
 		}
 		if (err != len)
-			pr_debug("Truncated TX packet: "
-				 " len %d != %zd\n", err, len);
+			vq_err(vq, "Fail to sending packets err : %d, len : %zd\n", err,
+len);

I'd rather make the pr_debug -> vq_err a separate change, with proper commit
log describing motivation.

This log was originally triggered when packets were truncated. But after the
modification of this patch, other error scenarios will also trigger this log.
That's why I modified the content and level of this log together.
Now, should I just change the content of this patch?

Thanks

quoted

 		if (!zcopy_used)
 			vhost_add_used_and_signal(&net->dev, vq, head, 0);
 		else
--
2.23.0

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help