Thread (54 messages) 54 messages, 10 authors, 2016-02-29

Re: [PATCH v6 1/2] mbuf: provide rte_pktmbuf_alloc_bulk API

From: Panu Matilainen <hidden>
Date: 2016-01-27 13:56:54

On 01/26/2016 07:03 PM, Huawei Xie wrote:
quoted hunk ↗ jump to hunk
v6 changes:
  reflect the changes in release notes and library version map file
  revise our duff's code style a bit to make it more readable

v5 changes:
  add comment about duff's device and our variant implementation

v3 changes:
  move while after case 0
  add context about duff's device and why we use while loop in the commit
message

v2 changes:
  unroll the loop a bit to help the performance

rte_pktmbuf_alloc_bulk allocates a bulk of packet mbufs.

There is related thread about this bulk API.
http://dpdk.org/dev/patchwork/patch/4718/
Thanks to Konstantin's loop unrolling.

Attached the wiki page about duff's device. It explains the performance
optimization through loop unwinding, and also the most dramatic use of
case label fall-through.
https://en.wikipedia.org/wiki/Duff%27s_device

In our implementation, we use while() loop rather than do{} while() loop
because we could not assume count is strictly positive. Using while()
loop saves one line of check if count is zero.

Signed-off-by: Gerald Rogers <redacted>
Signed-off-by: Huawei Xie <redacted>
Acked-by: Konstantin Ananyev <redacted>
---
  doc/guides/rel_notes/release_2_3.rst |  3 ++
  lib/librte_mbuf/rte_mbuf.h           | 55 ++++++++++++++++++++++++++++++++++++
  lib/librte_mbuf/rte_mbuf_version.map |  7 +++++
  3 files changed, 65 insertions(+)
diff --git a/doc/guides/rel_notes/release_2_3.rst b/doc/guides/rel_notes/release_2_3.rst
index 99de186..a52cba3 100644
--- a/doc/guides/rel_notes/release_2_3.rst
+++ b/doc/guides/rel_notes/release_2_3.rst
@@ -4,6 +4,9 @@ DPDK Release 2.3
  New Features
  ------------

+* **Enable bulk allocation of mbufs. **
+  A new function ``rte_pktmbuf_alloc_bulk()`` has been added to allow the user
+  to allocate a bulk of mbufs.

  Resolved Issues
  ---------------
diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
index f234ac9..b2ed479 100644
--- a/lib/librte_mbuf/rte_mbuf.h
+++ b/lib/librte_mbuf/rte_mbuf.h
@@ -1336,6 +1336,61 @@ static inline struct rte_mbuf *rte_pktmbuf_alloc(struct rte_mempool *mp)
  }

  /**
+ * Allocate a bulk of mbufs, initialize refcnt and reset the fields to default
+ * values.
+ *
+ *  @param pool
+ *    The mempool from which mbufs are allocated.
+ *  @param mbufs
+ *    Array of pointers to mbufs
+ *  @param count
+ *    Array size
+ *  @return
+ *   - 0: Success
+ */
+static inline int rte_pktmbuf_alloc_bulk(struct rte_mempool *pool,
+	 struct rte_mbuf **mbufs, unsigned count)
+{
+	unsigned idx = 0;
+	int rc;
+
+	rc = rte_mempool_get_bulk(pool, (void **)mbufs, count);
+	if (unlikely(rc))
+		return rc;
+
+	/* To understand duff's device on loop unwinding optimization, see
+	 * https://en.wikipedia.org/wiki/Duff's_device.
+	 * Here while() loop is used rather than do() while{} to avoid extra
+	 * check if count is zero.
+	 */
+	switch (count % 4) {
+	case 0:
+		while (idx != count) {
+			RTE_MBUF_ASSERT(rte_mbuf_refcnt_read(mbufs[idx]) == 0);
+			rte_mbuf_refcnt_set(mbufs[idx], 1);
+			rte_pktmbuf_reset(mbufs[idx]);
+			idx++;
+	case 3:
+			RTE_MBUF_ASSERT(rte_mbuf_refcnt_read(mbufs[idx]) == 0);
+			rte_mbuf_refcnt_set(mbufs[idx], 1);
+			rte_pktmbuf_reset(mbufs[idx]);
+			idx++;
+	case 2:
+			RTE_MBUF_ASSERT(rte_mbuf_refcnt_read(mbufs[idx]) == 0);
+			rte_mbuf_refcnt_set(mbufs[idx], 1);
+			rte_pktmbuf_reset(mbufs[idx]);
+			idx++;
+	case 1:
+			RTE_MBUF_ASSERT(rte_mbuf_refcnt_read(mbufs[idx]) == 0);
+			rte_mbuf_refcnt_set(mbufs[idx], 1);
+			rte_pktmbuf_reset(mbufs[idx]);
+			idx++;
+		}
+	}
+	return 0;
+}
+
+/**
   * Attach packet mbuf to another packet mbuf.
   *
   * After attachment we refer the mbuf we attached as 'indirect',
diff --git a/lib/librte_mbuf/rte_mbuf_version.map b/lib/librte_mbuf/rte_mbuf_version.map
index e10f6bd..257c65a 100644
--- a/lib/librte_mbuf/rte_mbuf_version.map
+++ b/lib/librte_mbuf/rte_mbuf_version.map
@@ -18,3 +18,10 @@ DPDK_2.1 {
  	rte_pktmbuf_pool_create;

  } DPDK_2.0;
+
+DPDK_2.3 {
+	global:
+
+	rte_pktmbuf_alloc_bulk;
+
+} DPDK_2.1;
Since rte_pktmbuf_alloc_bulk() is an inline function, it is not part of 
the library ABI and should not be listed in the version map.

I assume its inline for performance reasons, but then you lose the 
benefits of dynamic linking such as ability to fix bugs and/or improve 
itby just updating the library. Since the point of having a bulk API is 
to improve performance by reducing the number of calls required, does it 
really have to be inline? As in, have you actually measured the 
difference between inline and non-inline and decided its worth all the 
downsides?

	- Panu -
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help