Re: [PATCH net 1/3] net: bql: add netdev_tx_sent_queue_more() helper
From: Eric Dumazet <edumazet@google.com>
Date: 2018-10-31 22:52:20
On Wed, Oct 31, 2018 at 4:30 AM Tariq Toukan [off-list ref] wrote:
quoted hunk ↗ jump to hunk
On 30/10/2018 1:25 AM, Eric Dumazet wrote:quoted
When qdisc_run() tries to use BQL budget to bulk-dequeue a batch of packets, GSO can later transform this list in another list of skbs, and each skb is sent to device ndo_start_xmit(), one at a time, with skb->xmit_more being set to one but for last skb. Problem is that very often, BQL limit is hit in the middle of the packet train, forcing dev_hard_start_xmit() to stop the bulk send and requeue the end of the list. BQL role is to avoid head of line blocking, making sure a qdisc can deliver high priority packets before low priority ones. But there is no way requeued packets can be bypassed by fresh packets in the qdisc. Aborting the bulk send increases TX softirqs, and hot cache lines (after skb_segment()) are wasted. Note that for TSO packets, we never split a packet in the middle because of BQL limit being hit. Drivers should be able to update BQL counters without flipping/caring about BQL status, if the current skb has xmit_more set. Upper layers are ultimately responsible to stop sending another packet train when BQL limit is hit. Code template in a driver might look like the following : if (skb->xmit_more) { netdev_tx_sent_queue_more(tx_queue, nr_bytes); send_doorbell = netif_tx_queue_stopped(tx_queue); } else { netdev_tx_sent_queue(tx_queue, nr_bytes); send_doorbell = true; }Hi Eric, Nice optimization. I thought of another way of implementing it, by just extending the existing netdev_tx_sent_queue function with a new xmit_more parameter, that the driver passes. Something like this:diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index d837dad24b4c..feb9cbcb5759 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h@@ -3129,12 +3129,12 @@ static inline voidnetdev_txq_bql_complete_prefetchw(struct netdev_queue *dev_qu } static inline void netdev_tx_sent_queue(struct netdev_queue *dev_queue, - unsigned int bytes) + unsigned int bytes, bool more) { #ifdef CONFIG_BQL dql_queued(&dev_queue->dql, bytes); - if (likely(dql_avail(&dev_queue->dql) >= 0)) + if (more || likely(dql_avail(&dev_queue->dql) >= 0)) return; set_bit(__QUEUE_STATE_STACK_XOFF, &dev_queue->state); This unifies and simplifies both the stack and driver code, as the new suggested function netdev_tx_sent_queue_more can become a private case of the existing one.
Yes I thought of this, but it is too risky/invasive and I prefer
adding an opt-in mechanism
so that patch authors can test their patch.
Then when/if all drivers are updated, we can remove the legacy stuff
or rename everything.
mlx4 was the only NIC I could reasonably test myself.
Another suggestion from Willem is to add the following helper, returning
a boolean (doorbell needed)
+static inline bool __netdev_tx_sent_queue(struct netdev_queue *dev_queue,
+ unsigned int bytes, bool xmit_more)
+{
+ if (xmit_more) {
+ netdev_tx_sent_queue_more(ring->tx_queue, tx_info->nr_bytes);
+ return netif_tx_queue_stopped(dev_queue);
+ } else {
+ netdev_tx_sent_queue(dev_queue, bytes);
+ return true;
+ }
+}
+