Thread (32 messages) 32 messages, 6 authors, 2016-07-07

Re: [PATCH] mbuf: replace c memcpy code semantics with optimized rte_memcpy

From: Hunt, David <hidden>
Date: 2016-06-24 15:56:42

Hi Jerin,

I just ran a couple of tests on this patch on the latest master head on 
a couple of machines. An older quad socket E5-4650 and a quad socket 
E5-2699 v3

E5-4650:
I'm seeing a gain of 2% for un-cached tests and a gain of 9% on the 
cached tests.

E5-2699 v3:
I'm seeing a loss of 0.1% for un-cached tests and a gain of 11% on the 
cached tests.

This is purely the autotest comparison, I don't have traffic generator 
results. But based on the above, I don't think there are any performance 
issues with the patch.

Regards,
Dave.




On 24/5/2016 4:17 PM, Jerin Jacob wrote:
On Tue, May 24, 2016 at 04:59:47PM +0200, Olivier Matz wrote:
quoted
Hi Jerin,


On 05/24/2016 04:50 PM, Jerin Jacob wrote:
quoted
Signed-off-by: Jerin Jacob <redacted>
---
  lib/librte_mempool/rte_mempool.h | 5 ++---
  1 file changed, 2 insertions(+), 3 deletions(-)
diff --git a/lib/librte_mempool/rte_mempool.h b/lib/librte_mempool/rte_mempool.h
index ed2c110..ebe399a 100644
--- a/lib/librte_mempool/rte_mempool.h
+++ b/lib/librte_mempool/rte_mempool.h
@@ -74,6 +74,7 @@
  #include <rte_memory.h>
  #include <rte_branch_prediction.h>
  #include <rte_ring.h>
+#include <rte_memcpy.h>
  
  #ifdef __cplusplus
  extern "C" {
@@ -917,7 +918,6 @@ __mempool_put_bulk(struct rte_mempool *mp, void * const *obj_table,
  		    unsigned n, __rte_unused int is_mp)
  {
  	struct rte_mempool_cache *cache;
-	uint32_t index;
  	void **cache_objs;
  	unsigned lcore_id = rte_lcore_id();
  	uint32_t cache_size = mp->cache_size;
@@ -946,8 +946,7 @@ __mempool_put_bulk(struct rte_mempool *mp, void * const *obj_table,
  	 */
  
  	/* Add elements back into the cache */
-	for (index = 0; index < n; ++index, obj_table++)
-		cache_objs[index] = *obj_table;
+	rte_memcpy(&cache_objs[0], obj_table, sizeof(void *) * n);
  
  	cache->len += n;
  
The commit title should be "mempool" instead of "mbuf".
I will fix it.
quoted
Are you seeing some performance improvement by using rte_memcpy()?
Yes, In some case, In default case, It was replaced with memcpy by the
compiler itself(gcc 5.3). But when I tried external mempool manager patch and
then performance dropped almost 800Kpps. Debugging further it turns out that
external mempool managers unrelated change was knocking out the memcpy.
explicit rte_memcpy brought back 500Kpps. Remaing 300Kpps drop is still
unknown(In my test setup, packets are in the local cache, so it must be
something do with __mempool_put_bulk text alignment change or similar.

Anyone else observed performance drop with external poolmanager?

Jerin
quoted
Regards
Olivier
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help