Thread (13 messages) 13 messages, 6 authors, 2023-07-10

RE: [EXTERNAL] Re: [PATCH V4 net] net: mana: Fix MANA VF unload when host is unresponsive

From: Souradeep Chakrabarti <hidden>
Date: 2023-07-06 10:41:14
Also in: linux-hyperv, linux-rdma, lkml, stable

-----Original Message-----
From: Alexander Lobakin <aleksander.lobakin@intel.com>
Sent: Wednesday, July 5, 2023 8:06 PM
To: Souradeep Chakrabarti <redacted>; souradeep
chakrabarti [off-list ref]
Cc: KY Srinivasan <kys@microsoft.com>; Haiyang Zhang
[off-list ref]; wei.liu@kernel.org; Dexuan Cui
[off-list ref]; davem@davemloft.net; edumazet@google.com;
kuba@kernel.org; pabeni@redhat.com; Long Li [off-list ref]; Ajay
Sharma [off-list ref]; leon@kernel.org;
cai.huoqing@linux.dev; ssengar@linux.microsoft.com; vkuznets@redhat.com;
tglx@linutronix.de; linux-hyperv@vger.kernel.org; netdev@vger.kernel.org;
linux-kernel@vger.kernel.org; linux-rdma@vger.kernel.org;
stable@vger.kernel.org
Subject: Re: [EXTERNAL] Re: [PATCH V4 net] net: mana: Fix MANA VF unload
when host is unresponsive

[You don't often get email from aleksander.lobakin@intel.com. Learn why this is
important at https://aka.ms/LearnAboutSenderIdentification ]

From: Souradeep Chakrabarti <redacted>
Date: Mon, 3 Jul 2023 19:55:06 +0000
quoted
quoted
-----Original Message-----
From: Alexander Lobakin <aleksander.lobakin@intel.com>
Sent: Monday, July 3, 2023 10:18 PM
[...]
quoted
quoted
quoted
    for (i = 0; i < apc->num_queues; i++) {
            txq = &apc->tx_qp[i].txq;
-
-           while (atomic_read(&txq->pending_sends) > 0)
+           while (atomic_read(&txq->pending_sends) > 0 &&
+                  time_before(jiffies, timeout)) {
                    usleep_range(1000, 2000);> +            }
    }
120 seconds by 2 msec step is 60000 iterations, by 1 msec is 120000
iterations. I know usleep_range() often is much less precise, but still.
Do you really need that much time? Has this been measured during the
tests that it can take up to 120 seconds or is it just some random
value that "should be enough"?
If you really need 120 seconds, I'd suggest using a timer / delayed
work instead of wasting resources.
Here the intent is not waiting for 120 seconds, rather than avoid
continue checking the pending_sends  of each tx queues for an indefinite time,
before freeing sk_buffs.
quoted
The pending_sends can only get decreased for a tx queue,  if
mana_poll_tx_cq() gets called for a completion notification and increased by
xmit.
quoted
In this particular bug, apc->port_is_up is not set to false, causing
xmit to keep increasing the pending_sends for the queue and
mana_poll_tx_cq() not getting called for the queue.

If we see the comment in the function mana_dealloc_queues(), it mentions it :

2346     /* No packet can be transmitted now since apc->port_is_up is false.
2347      * There is still a tiny chance that mana_poll_tx_cq() can re-enable
2348      * a txq because it may not timely see apc->port_is_up being cleared
2349      * to false, but it doesn't matter since mana_start_xmit() drops any
2350      * new packets due to apc->port_is_up being false.

The value 120 seconds has been decided here based on maximum number of
queues
This is quite opposite to what you're saying above. How should I connect these
two:

Here the intent is not waiting for 120 seconds

+

The value 120 seconds has been decided here based on maximum number of
queues

?
Can cleaning the Tx queues really last for 120 seconds?
My understanding is that timeouts need to be sensible and not go to the outer
space. What is the medium value you got during the tests?
For each queue each takes few milli second, in a normal condition. So
based on maximum number of allowed queues for our h/w it won't
go beyond a sec. 
The 120s only happens rarely during some NIC HW issue -unexpected.
So this timeout will only trigger in a very rare scenario.
quoted
are allowed in this specific hardware, it is a safe assumption.
quoted
quoted
+   for (i = 0; i < apc->num_queues; i++) {
+           txq = &apc->tx_qp[i].txq;
+           cq = &apc->tx_qp[i].tx_cq;
cq can be just &txq->tx_cq.
mana_txq  structure does not have a pointer to mana_cq.
Sorry, misread, my bad.
quoted
quoted
quoted
+           while (atomic_read(&txq->pending_sends)) {
+                   skb = skb_dequeue(&txq->pending_skbs);
+                   mana_unmap_skb(skb, apc);
+                   napi_consume_skb(skb, cq->budget);
(you already have comment about this one)
quoted
+                   atomic_sub(1, &txq->pending_sends);
+           }
+   }
    /* We're 100% sure the queues can no longer be woken up, because
     * we're sure now mana_poll_tx_cq() can't be running.
     */
Thanks,
Olek
Thanks,
Olek
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help