Thread (74 messages) 74 messages, 6 authors, 2020-01-17

Re: [dpdk-dev] [PATCH v6 3/4] net/ixgbe: cleanup Tx buffers

From: Ananyev, Konstantin <hidden>
Date: 2020-01-08 15:13:18

Hi Chenxu,
 
Thanks for your read.

for our research, we don't think it is a good plan that reuse ixgbe_xmit_cleanup() or ixgbe_tx_free_bufs.
following two opinion will show the reason.
I think there is a main misunderstandings between us:

TXD.DD bit setting.
You expect that for every completed packet packet HW will set DD bit.
For ixgbe it is not the case.
For ixgbe (and AFAIK for i40e) we don't set RS bit for every packet.
Instead we set it for a bunch of packets: tx_rs_thresh value.
That's one of the optimizations ixgbe PMD does.
So valid assumption is to check DD bit only for TXDs where RS bit is set.
That's how both ixgbe_xmit_cleanup() or ixgbe_tx_free_bufs() works.
All TXDs between last_desc_cleaned and desc_to_clean_to
(or between tx_next_dd and tx_next_dd + tx_rs_thresh) are considered 
by PMD as 'still busy', even in reality some of them might already be not. 

Yes, it means that in some cases some of them mbufs will be considered
by PMD as still in use, while they are really not.
But I don't think it is a critical thing.
Having TXD and related mbufs management logic within PMD
consistent is much more important.
 
In other words, I still think reusing exiting functions
(ixgbe_xmit_cleanup() and ixgbe_tx_free_bufs) for tx_cleanup_done()
is a better option then trying to create new ones. 

BTW, as a side question, do you guys have any vehicle to test these functions?
AFAIK, right now this function is not called from any app:
find  app examples/ -type f -name '*.[h,c]' | xargs grep -l rte_eth_tx_done_cleanup
<empty>
first, it doesn't solve the isituations when cnt % rs_thresh != 0 even by using your plan.

For example, the parameters of API rte_eth_tx_done_cleanup free_cnt=0, that means we want to free all possible mbufs(one mbuf per
packet). we find it after checking that
No.18 mbuf status is DD, and the status of mbuf after No.19 is in use.

If the plan is that call ixgbe_xmit_cleanup() cnt /rs_thresh + 1 times( 1 in this example).
After ixgbe_xmit_cleanup()  the value of last_desc_cleaned is not same as the actual.
the function ixgbe_xmit_pkts will call ixgbe_xmit_cleanup automatically when nb_tx_free<tx_free_thresh, mbufs before last_desc_cleaned
(No.19~No.31 for this example)will not be freed

however, all of above is for ixgbe_xmit_pkts() and ixgbe_xmit_cleanup(). if that for ixgbe_xmit_pkts_simple() and ixgbe_tx_free_bufs(),it
may be worse.
because the free buff action is in the function ixgbe_tx_free_bufs, we won't  do free in the outside code. And  no mbufs will be freed.

If you do free in outside code for the MOD mbufs and update the value of tx_next_dd and nb_tx_free. there is no need to call
ixgbe_tx_free_bufs.

here is two case about tx_done_cleanup and ixgbe_tx_free_bufs. it may be more detail

Case 1
Status:
	Suppose before calling tx_done_cleanup(..., uint32_t cnt), txq status are as followings.

	one mbuf per packet
	txq->tx_rs_thresh = 32;
	clean mbuf from index id
	txr[id+9].wb.status & IXGBE_TXD_STAT_DD != 0.   // means we cloud free 10 packets
	txr[id+31].wb.status & IXGBE_TXD_STAT_DD == 0.  // means we cloud not clean 32 packets

Process Flow:
	tx_done_cleanup(..., 10) be invoked to free 10 packets.
	Firstly ,tx_done_cleanup will invoke ixgbe_xmit_cleanup to count how many mbufs could free.
	But ixgbe_xmit_cleanup will return -1 (means no mbufs to free), please ref code bellow

	status = txr[desc_to_clean_to].wb.status;
	if (!(status & rte_cpu_to_le_32(IXGBE_TXD_STAT_DD))) {
		PMD_TX_FREE_LOG(DEBUG,
				"TX descriptor %4u is not done"
				"(port=%d queue=%d)",
				desc_to_clean_to,
				txq->port_id, txq->queue_id);
		/* Failed to clean any descriptors, better luck next time */
		return -(1);
	}

Result:
	We do nothing

Expect:
	Free 10 packets.

Thoughts:
	If we try to check status from txr[id].wb.status to txr[id+31].wb.status one by one, we could find txr[id].wb.status,
txr[id+1].wb.status, txr[id+2].wb.status ……txr[id+9].wb.status, all of them status are IXGBE_TXD_STAT_DD, so actually we have the ability
to free 10 packets.



Case 2:
Status:
	Suppose before calling tx_done_cleanup(..., uint32_t cnt), txq status are as followings.

	one mbuf per packet
	txq->tx_rs_thresh = 32;
	clean mbuf from index id
	txr[id+31].wb.status & IXGBE_TXD_STAT_DD != 0.   // means we cloud free 32 packets

Process Flow:
	tx_done_cleanup(..., 10) be invoked to free 10 packets.
	When tx_done_cleanup invoke ixgbe_tx_free_bufs free bufs, it will free 32 packets..

Result:
	Free 32 packets.

Expect:
	Free 10 packets.

Thoughts:
	If we try to check status from txr[id].wb.status to txr[id+31].wb.status one by one, we could find txr[id].wb.status,
txr[id+1].wb.status, txr[id+2].wb.status ……txr[id+10].wb.status, all of them status are IXGBE_TXD_STAT_DD, so we have the ability to free
10 packets only.




 second, we have a lot of codes what is same as it in function ixgbe_xmit_cleanup.


 we do analysis for function ixgbe_tx_free_bufs and ixgbe_xmit_cleanup and segment their actions.
 the actual action of codes are following points:
1. Determine the start position of the free action.
2. Check whether the status of the end position is DD
3. Free ( set status=0 in xmit_cleanup() while call rte_mempool_put_bulk() in tx_free_bufs())
4. Update location variables( last_desc_cleaned in xmit_cleanup() and  tx_next_dd  in tx_free_bufs() and nb_tx_free in both).

If reuse ixgbe_xmit_cleanup, we need get the number of mbufs before calling ixgbe_xmit_cleanup()
We also need to implement the following actions:
1. Determine the starting position
2. Find the last mbuf of each PKT and confirm whether its status is DD
3. Free (don't do for tx_free_bufs())
4. call xmit_cleanup function in loop.

By comparison, it is possible to see that 3/4 functions of ixgbe_xmit_cleanup are already being called before,
it means that ixgbe_xmit_cleanup is not highly reusable, the repeat codes is so many.


following code is the main code in ixgbe_xmit_cleanup() and the action code.

 1.
	last_desc_cleaned = txq->last_desc_cleaned;

 2.
	/* Determine the last descriptor needing to be cleaned */
	desc_to_clean_to = (uint16_t)(last_desc_cleaned + txq->tx_rs_thresh);
	if (desc_to_clean_to >= nb_tx_desc)
		desc_to_clean_to = (uint16_t)(desc_to_clean_to - nb_tx_desc);

	/* Check to make sure the last descriptor to clean is done */
	desc_to_clean_to = sw_ring[desc_to_clean_to].last_id;
	status = txr[desc_to_clean_to].wb.status;
	if (!(status & rte_cpu_to_le_32(IXGBE_TXD_STAT_DD))) {
		PMD_TX_FREE_LOG(DEBUG,
		return -(1);
	}
3.
	/* Figure out how many descriptors will be cleaned */
	if (last_desc_cleaned > desc_to_clean_to)
		nb_tx_to_clean = (uint16_t)((nb_tx_desc - last_desc_cleaned) +
							desc_to_clean_to);
	else
		nb_tx_to_clean = (uint16_t)(desc_to_clean_to -
						last_desc_cleaned);
	txr[desc_to_clean_to].wb.status = 0;
4.
	/* Update the txq to reflect the last descriptor that was cleaned */
	txq->last_desc_cleaned = desc_to_clean_to;
	txq->nb_tx_free = (uint16_t)(txq->nb_tx_free + nb_tx_to_clean);








quoted
-----Original Message-----
From: Ananyev, Konstantin
Sent: Tuesday, January 7, 2020 10:10 PM
To: Di, ChenxuX <redacted>; dev@dpdk.org
Cc: Yang, Qiming <redacted>
Subject: RE: [dpdk-dev] [PATCH v6 3/4] net/ixgbe: cleanup Tx buffers


Hi Chenxu,
quoted
quoted
quoted
quoted
quoted
quoted
quoted
+ * tx_tail is the last sent packet on the sw_ring. Goto
+ the end
+ * of that packet (the last segment in the packet chain)
+ and
+ * then the next segment will be the start of the oldest
+ segment
+ * in the sw_ring.
Not sure I understand the sentence above.
tx_tail is the value of TDT HW register (most recently armed by SW
TD).
quoted
quoted
quoted
quoted
quoted
quoted
last_id  is the index of last descriptor for multi-seg packet.
next_id is just the index of next descriptor in HW TD ring.
How do you conclude that it will be the ' oldest segment in the
sw_ring'?
quoted
quoted
quoted
quoted
quoted
quoted
The tx_tail is the last sent packet on the sw_ring. While the
xmit_cleanup or Tx_free_bufs will be call when the nb_tx_free
<
tx_free_thresh .
quoted
So the sw_ring[tx_tail].next_id must be the begin of mbufs
which are not used or  Already freed . then begin the loop
until the mbuf is used and
begin to free them.
quoted

quoted
Another question why do you need to write your own functions?
Why can't you reuse existing ixgbe_xmit_cleanup() for
full(offload) path and
ixgbe_tx_free_bufs() for simple path?
Yes,  ixgbe_xmit_cleanup() doesn't free mbufs, but at least
it could be used to determine finished TX descriptors.
Based on that you can you can free appropriate sw_ring[] entries.
The reason why I don't reuse existing function is that they
all free several mbufs While the free_cnt of the API
rte_eth_tx_done_cleanup() is the
number of packets.
quoted
It also need to be done that check which mbuffs are from the same
packet.
quoted
quoted
quoted
quoted
At first, I don't see anything bad if tx_done_cleanup() will
free only some segments from the packet. As long as it is safe -
there is no
problem with that.
quoted
quoted
I think rte_eth_tx_done_cleanup() operates on mbuf, not packet
quantities.
quoted
quoted
quoted
quoted
But in our case I think it doesn't matter, as
ixgbe_xmit_cleanup() mark TXDs as free only when HW is done with all
TXDs for that packet.
quoted
quoted
quoted
quoted
As long as there is a way to reuse existing code and avoid
duplication (without introducing any degradation) - we should use it.
And I think there is a very good opportunity here to reuse
existing
ixgbe_xmit_cleanup() for tx_done_cleanup() implementation.
Moreover because your code doesn’t follow
ixgbe_xmit_pkts()/ixgbe_xmit_cleanup()
logic and infrastructure, it introduces unnecessary scans over
TXD ring, and in some cases doesn't work as expected:

+while (1) {
+tx_last = sw_ring[tx_id].last_id;
+
+if (sw_ring[tx_last].mbuf) {
+if (txr[tx_last].wb.status &
+IXGBE_TXD_STAT_DD) {
...
+} else {
+/*
+ * mbuf still in use, nothing left to
+ * free.
+ */
+break;

It is not correct to expect that IXGBE_TXD_STAT_DD will be set
on last TXD for
*every* packet.
We set IXGBE_TXD_CMD_RS bit only on threshold packet last descriptor.
Plus ixgbe_xmit_cleanup() can cleanup TXD wb.status.

So I strongly recommend to reuse ixgbe_xmit_cleanup() here.
It would be much less error prone and will help to avoid code duplication.

Konstantin
At first.
The function ixgbe_xmit_cleanup(struct ixgbe_tx_queue *txq) will
cleanup
TXD wb.status.
quoted
the number of status cleanuped is always txq->tx_rs_thresh.
Yes, and what's wrong with it?
quoted
The  API rte_eth_tx_done_cleanup() in rte_eth_dev.h show that
@param free_cnt
 *   Maximum number of packets to free. Use 0 to indicate all possible
packets
quoted
quoted
quoted
 *   should be freed. Note that a packet may be using multiple mbufs.
I don't think it is a good approach, better would be to report
number of freed mbufs, but ok, as it is a public API, we probably need to
keep it as it is.
quoted
quoted
quoted
a number must be set while ixgbe_xmit_cleanup and
ixgbe_tx_free_bufs only
have one parameter txq.

Yes, ixgbe_xmit_cleanup() cleans up at least txq->tx_rs_thresh TXDs.
So if user requested more packets to be freed we can call
ixgbe_xmit_cleanup() in a loop.
That is a great idea and I discuss with my workmate about it today.
there is also some Question that we don’t confirm.
Actually it can call ixgbe_xmit_cleanup() in a loop if user requested
more packets, How to deal with the MOD. For example:
The default tx_rs_thresh is 32.if the count of mbufs we need free is 50.
We can call ixgbe_xmit_cleanup() one time to free 32 mbufs.
Then how about other 18 mbufs.
1.If we do nothing, it looks not good.
2.if we call ixgbe_xmit_cleanup() successfully, we free 14 mbufs more.
3.if we call ixgbe_xmit_cleanup() fail, the status of No.32 mbufs is not DD
  While No .18 is DD. So we do not free 18 mbufs what we can and should free.

We have try some plans about it, likes add parameter for
ixgbe_xmit_cleanup(), change  Tx_rs_thresh or copy the code or
ixgbe_xmit_cleanup() as a new function with more Parameter. But all of them
seem not perfect.
quoted
So  can you give some comment about it? It seems not easy as we think by
reuse function.

My thought about it:
for situations when cnt % rs_thresh != 0 we'll still call ixgbe_xmit_cleanup() cnt /
rs_thresh + 1 times.
But then, we can free just cnt % rs_thresh mbufs, while keeping rest of them
intact.

Let say at some moment we have txq->nb_tx_free==0, and user called
tx_done_cleanup(txq, ..., cnt==50) So we call ixgbe_xmit_cleanup(txq) first time.
Suppose it frees 32 TXDs, then we walk through corresponding sw_ring[] entries
and let say free 32 packets (one mbuf per packet).
Then we call ixgbe_xmit_cleanup(txq)  second time.
Suppose it will free another 32 TXDs, we again walk thour sw_ring[], but free
only first 18 mbufs and return.
Suppose we call tx_done_cleanup(txq, cnt=50) immediately again.
Now txq->nb_tx_free==64, so we can start to scan sw_entries[] from tx_tail
straightway. We'll skip first 50 entries as they are already empty, then free
remaining 14 mbufs, then will call ixgbe_xmit_cleanup(txq) again, and if it would
be successful, will scan and free another 32 sw_ring[] entries.
Then again will call ixgbe_xmit_cleanup(txq), but will free
only first 8 available sw_ring[].mbuf.
Probably a bit easier with the code:

tx_done_cleanup(..., uint32_t cnt)
{
       swr = txq->sw_ring;
       txr     = txq->tx_ring;
       id   = txq->tx_tail;

      if (txq->nb_tx_free == 0)
         ixgbe_xmit_cleanup(txq);

      free =  txq->nb_tx_free;
      for (n = 0; n < cnt && free != 0; ) {

          for (j = 0; j != free && n < cnt; j++) {
             swe = &swr[id + j];
             if (swe->mbuf != NULL) {
                   rte_pktmbuf_free_seg(swe->mbuf);
                   swe->mbuf = NULL;

                  /* last segment in the packet, increment packet count */
                  n += (swe->last_id == id + j);
             }
          }

          if (n < cnt) {
               ixgbe_xmit_cleanup(txq);
               free =   txq->nb_tx_free - free;
          }
     }
     return n;
}

For the situation when there are less then rx_thresh free TXDs ((txq-
quoted
tx_ring[desc_to_clean_to].wb.status & IXGBE_TXD_STAT_DD) == 0) we do
nothing - in that case we consider there are no more mbufs to free.
quoted
quoted
quoted
And what should do is not only free buffers and status but also
check which bufs are from  One packet and count the packet freed.
ixgbe_xmit_cleanup() itself doesn't free mbufs itself.
It only cleans up TXDs.
So in tx_done_cleanup() after calling ixgbe_xmit_cleanup() you'll
still need to go through sw_ring[] entries that correspond to free TXDs and
call mbuf_seg_free().
quoted
quoted
You can count number of full packets here.
quoted
So I think it can't be implemented that reuse function
xmit_cleanup without
change it.
quoted
And create a new function with the code of xmit_cleanup will cause
many
duplication.

I don't think it would.
I think all we need here is something like that (note it is
schematic one, doesn't take into accounf that TXD ring is circular):

tx_done_cleanup(..., uint32_t cnt)
{
      /* we have txq->nb_tx_free TXDs starting from txq->tx_tail.
           Scan them first and free as many mbufs as we can.
           If we need more mbufs to free call  ixgbe_xmit_cleanup()
           to free more TXDs. */

       swr = txq->sw_ring;
       txr     = txq->tx_ring;
       id   = txq->tx_tail;
       free =  txq->nb_tx_free;

       for (n = 0; n < cnt && free != 0; ) {

          for (j = 0; j != free && n < cnt; j++) {
             swe = &swr[id + j];
             if (swe->mbuf != NULL) {
                   rte_pktmbuf_free_seg(swe->mbuf);
                   swe->mbuf = NULL;
             }
             n += (swe->last_id == id + j)
          }

          if (n < cnt) {
               ixgbe_xmit_cleanup(txq);
               free =   txq->nb_tx_free - free;
          }
     }
     return n;
}
quoted
Above all , it seem not a perfect idea to reuse ixgbe_xmit_cleanup().
Totally disagree, see above.
quoted
Second.
The function in patch is copy from code in igb_rxtx.c. it already
updated in 2017, The commit id is
8d907d2b79f7a54c809f1c44970ff455fa2865e1.

I realized that.
But I think it as a problem, not a positive thing.
While they do have some similarities, igb abd ixgbe are PMDs for
different devices, and their TX code differs quite a lot. Let say
igb doesn't use tx_rs_threshold, but instead set RS bit for each last TXD.
So, just blindly copying tx_done_cleanup() from igb to ixgbe doesn't
look like a good idea to me.
quoted
I trust the logic of code is right.
Actually it don't complete for ixgbe, i40e and ice, while it don't
change the value of last_desc_cleaned and tx_next_dd. And it's
beginning prefer last_desc_cleaned or  tx_next_dd(for offload or
simple) to
tx_tail.
quoted
So, I suggest to use the old function and fix the issue.
quoted
quoted
quoted
quoted
This is the first packet that will be
+ * attempted to be freed.
+ */
+
+/* Get last segment in most recently added packet. */
+tx_last = sw_ring[txq->tx_tail].last_id;
+
+/* Get the next segment, which is the oldest segment in ring.
+*/ tx_first = sw_ring[tx_last].next_id;
+
+/* Set the current index to the first. */ tx_id =
+tx_first;
+
+/*
+ * Loop through each packet. For each packet, verify that
+an
+ * mbuf exists and that the last segment is free. If so,
+free
+ * it and move on.
+ */
+while (1) {
+tx_last = sw_ring[tx_id].last_id;
+
+if (sw_ring[tx_last].mbuf) { if (!(txr[tx_last].wb.status
+&
+IXGBE_TXD_STAT_DD))
+break;
+
+/* Get the start of the next packet. */ tx_next =
+sw_ring[tx_last].next_id;
+
+/*
+ * Loop through all segments in a
+ * packet.
+ */
+do {
+rte_pktmbuf_free_seg(sw_ring[tx_id].mbuf);
+sw_ring[tx_id].mbuf = NULL; sw_ring[tx_id].last_id =
+tx_id;
+
+/* Move to next segment. */ tx_id =
+sw_ring[tx_id].next_id;
+
+} while (tx_id != tx_next);
+
+/*
+ * Increment the number of packets
+ * freed.
+ */
+count++;
+
+if (unlikely(count == (int)free_cnt)) break; } else {
+/*
+ * There are multiple reasons to be here:
+ * 1) All the packets on the ring have been
+ *    freed - tx_id is equal to tx_first
+ *    and some packets have been freed.
+ *    - Done, exit
+ * 2) Interfaces has not sent a rings worth of
+ *    packets yet, so the segment after tail is
+ *    still empty. Or a previous call to this
+ *    function freed some of the segments but
+ *    not all so there is a hole in the list.
+ *    Hopefully this is a rare case.
+ *    - Walk the list and find the next mbuf. If
+ *      there isn't one, then done.
+ */
+if (likely(tx_id == tx_first && count != 0)) break;
+
+/*
+ * Walk the list and find the next mbuf, if any.
+ */
+do {
+/* Move to next segment. */ tx_id =
+sw_ring[tx_id].next_id;
+
+if (sw_ring[tx_id].mbuf)
+break;
+
+} while (tx_id != tx_first);
+
+/*
+ * Determine why previous loop bailed. If there
+ * is not an mbuf, done.
+ */
+if (sw_ring[tx_id].mbuf == NULL) break; } }
+
+return count;
+}
+
 static void __attribute__((cold))
ixgbe_tx_free_swring(struct ixgbe_tx_queue *txq)  { diff
--git a/drivers/net/ixgbe/ixgbe_rxtx.h
b/drivers/net/ixgbe/ixgbe_rxtx.h index
505d344b9..2c3770af6
100644
--- a/drivers/net/ixgbe/ixgbe_rxtx.h
+++ b/drivers/net/ixgbe/ixgbe_rxtx.h
@@ -285,6 +285,8 @@ int
ixgbe_rx_vec_dev_conf_condition_check(struct
rte_eth_dev *dev);  int ixgbe_rxq_vec_setup(struct
ixgbe_rx_queue *rxq);  void
ixgbe_rx_queue_release_mbufs_vec(struct
ixgbe_rx_queue *rxq);

+int ixgbe_tx_done_cleanup(void *txq, uint32_t free_cnt);
+
 extern const uint32_t ptype_table[IXGBE_PACKET_TYPE_MAX];
 extern const uint32_t
ptype_table_tn[IXGBE_PACKET_TYPE_TN_MAX];

--
2.17.1
  
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help