Thread (15 messages) 15 messages, 4 authors, 2023-09-19

Re: [PATCH net] net: stmmac: Use hrtimer for TX coalescing

From: Vincent Whitchurch <hidden>
Date: 2023-08-30 14:55:45
Subsystem: networking drivers, stmmac ethernet driver, the rest · Maintainers: Andrew Lunn, "David S. Miller", Eric Dumazet, Jakub Kicinski, Paolo Abeni, Linus Torvalds

On Fri, 2023-08-25 at 19:38 +0200, Felix Fietkau wrote:
On 25.08.23 15:42, Vincent Whitchurch wrote:
quoted
Since the timer expiry function schedules napi, and the napi poll
function stmmac_tx_clean() re-arms the timer if it sees that there are
pending tx packets, shouldn't an implementation similar to hip04_eth.c
(which doesn't save/check the last tx timestamp) be sufficient?
To be honest, I didn't look very closely at what the timer does and how 
coalescing works. I don't know if delaying the timer processing with 
every tx is the right choice, or if it should be armed only once. 
However, as you pointed out, the commit that dropped the re-arming was 
reverted because of regressions.

My suggestions are intended to preserve the existing behavior as much as 
possible (in order to avoid regressions), while also achieving the 
benefit of significantly reducing CPU cycles wasted by re-arming the timer.
I looked at it some more and the continuous postponing behaviour strikes
me as quite odd.  For example, if you set tx-frames coalescing to 0 then
cleanups could happen much later than the specified tx-usecs period, in
the absence of RX traffic.  Also, if we'd have to have a shared
timestamp between the callers of stmmac_tx_timer_arm() and the hrtimer
to preserve this continuous postponing behaviour, then we'd need to
introduce some locking between the timer expiry and those functions, to
avoid race conditions.

So currently I am experimenting with just the following patch.  The
second hunk is similar to hip04_eth.c (the comment is mine).  AFAICS
hip04_eth.c doesn't have code equivalent to the first hunk of the patch
and instead unconditionally restarts the timer from its napi poll if it
sees that it's needed.

I can't reproduce the mod_timer vs hrtimer performance problems on my
hardware, but the patch below reduces the number of (re-)starts of the
stmmac_tx_timer by around 85% in an iperf3 test over a gigabit link
(just the second hunk reduces it by about 30%).

Any test results with this patch on the hardware with the performance
problems would be appreciated.
diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
index 4727f7be4f86..4b6e5061b5a6 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
@@ -2703,9 +2703,7 @@ static int stmmac_tx_clean(struct stmmac_priv *priv, int budget, u32 queue)
 
 	/* We still have pending packets, let's call for a new scheduling */
 	if (tx_q->dirty_tx != tx_q->cur_tx)
-		hrtimer_start(&tx_q->txtimer,
-			      STMMAC_COAL_TIMER(priv->tx_coal_timer[queue]),
-			      HRTIMER_MODE_REL);
+		stmmac_tx_timer_arm(priv, queue);
 
 	__netif_tx_unlock_bh(netdev_get_tx_queue(priv->dev, queue));
 
@@ -2987,6 +2985,20 @@ static void stmmac_tx_timer_arm(struct stmmac_priv *priv, u32 queue)
 {
 	struct stmmac_tx_queue *tx_q = &priv->dma_conf.tx_queue[queue];
 
+	/*
+	 * Note that the hrtimer could expire immediately after we check this,
+	 * and the hrtimer and the callers of this function do not share a
+	 * lock.
+	 *
+	 * This should however be safe since the only thing the hrtimer does is
+	 * schedule napi (or ask for it run again if it's already running), and
+	 * stmmac_tx_clean(), called from the napi poll function, also calls
+	 * stmmac_tx_timer_arm() at the end if it sees that there are any TX
+	 * packets which have not yet been cleaned.
+	 */
+	if (hrtimer_is_queued(&tx_q->txtimer))
+		return;
+
 	hrtimer_start(&tx_q->txtimer,
 		      STMMAC_COAL_TIMER(priv->tx_coal_timer[queue]),
 		      HRTIMER_MODE_REL);
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help