Thread (19 messages) 19 messages, 5 authors, 2021-04-14

Re: [dpdk-dev] [PATCH v1 1/2] app/test: remove unnecessary barriers for ring stress test

From: Honnappa Nagarahalli <hidden>
Date: 2021-01-29 03:18:04

<snip>
quoted
quoted
Hi Feifei,
quoted
The variable "wrk_cmd" is a signal to control threads from running
and stopping. When worker lcores load "wrk_cmd ==
WRK_CMD_RUN",
quoted
quoted
quoted
they
start
quoted
running and when worker lcores load "wrk_cmd == WRK_CMD_STOP",
they
quoted
stop.

For the wmb in test_mt1, no storing operations must keep the order
after storing "wrk_cmd". Thus the wmb is unnecessary.
I think there is a bug in my original code, we should do smp_wmb()
*before* setting wrk_cmd, not after:

        /* launch on all workers */
        RTE_LCORE_FOREACH_WORKER(lc) {
                arg[lc].rng = r;
                arg[lc].stats = init_stat;
                rte_eal_remote_launch(test, &arg[lc], lc);
        }

        /* signal worker to start test */
+      rte_smp_wmb();
        wrk_cmd = WRK_CMD_RUN;
-       rte_smp_wmb();

        usleep(run_time * US_PER_S);


I still think we'd better have some synchronisation here.
Otherwise what would prevent compiler and/or cpu to update wrk_cmd
out of order (before _init_ phase is completed)?
We probably can safely assume no reordering from the compiler here,
as we have function calls straight before and after 'wrk_cmd =
WRK_CMD_RUN;'
quoted
quoted
But for consistency and easier maintenance, I still think it is
better to have something here, after all it is not performance critical pass.
Agree that this is not performance critical.

This is more about correctness (as usually people refer to code to
understand the concepts). You can refer to video [1]. Essentially, the
pthread_create has 'happens-before' behavior. i.e. all the memory
operations before the pthread_create are visible to the new thread.
The
rte_smp_rmb() barrier in the thread function is not required as it reads the
data that was set before the thread was launched.

rte_eal_remote_launch() doesn't call pthread_create().
All it does -  updates global variable (lcore_config) and writes/reads to/from
the pipe.
Thanks for the reminder ☹
I think rte_eal_remote_launch and rte_eal_wait_lcore need to provide behavior similar to pthread_launch and pthread_join respectively.

There is use of rte_smp_*mb in those functions as well. Those need to be fixed first and then look at these.
quoted
I do not know why rte_smp_wmb is required. The update to 'wrk_cmd' is
seen by the thread eventually. rte_smp_wmb does not result in update
being seen sooner/immediately.

We don't need it sooner.
We need to make sure it wouldn't be seen by any worker thread before all
workers are launched.
To make sure all workers start the test at approximately same moment.
That's why I think wmb() should be before 'wrk_cmd = WRK_CMD_RUN;' in
my original code.
quoted
[1]
https://www.youtube.com/watch?t=4170&v=KeLBd2EJLOU&feature=youtu.
be
quoted
quoted
quoted
For the rmb in test_worker, the parameters have been prepared when
worker lcores call "test_worker". It is unnessary to wait wrk_cmd
to be loaded, then the parameters can be loaded, So the rmb can be
removed.

It is not only about parameters loading,  it is to prevent worker
core to start too early.
Because 'pthread_launch' provides the 'happens-before' behavior, the
worker core will see the updates that happened before the worker was
launched.
quoted
I suggest changing the commit log to provide the reasoning around
pthread_create.
quoted
quoted
As I understand, your goal is to get rid of rte_smp_*() calls.
Might be better to replace such places here with _atomic_ semantics.
Then, as I can see, we also can get rid of 'volatile' fo wrk_cmd.
quoted
In the meanwhile, fix a typo. The note above storing "stop" into
"wrk_cmd" should be "stop test" rather than "start test".

Signed-off-by: Feifei Wang <redacted>
Reviewed-by: Honnappa Nagarahalli
[off-list ref]
quoted
quoted
quoted
Reviewed-by: Ruifeng Wang <redacted>
---
 app/test/test_ring_stress_impl.h | 5 +----
 1 file changed, 1 insertion(+), 4 deletions(-)
diff --git a/app/test/test_ring_stress_impl.h
b/app/test/test_ring_stress_impl.h
index f9ca63b90..384555ef9 100644
--- a/app/test/test_ring_stress_impl.h
+++ b/app/test/test_ring_stress_impl.h
@@ -198,7 +198,6 @@ test_worker(void *arg, const char *fname,
int32_t
prcs)
quoted
 	fill_ring_elm(&loc_elm, lc);

 	while (wrk_cmd != WRK_CMD_RUN) {
-		rte_smp_rmb();
 		rte_pause();
 	}
@@ -357,13 +356,11 @@ test_mt1(int (*test)(void *))

 	/* signal worker to start test */
 	wrk_cmd = WRK_CMD_RUN;
-	rte_smp_wmb();

 	usleep(run_time * US_PER_S);

-	/* signal worker to start test */
+	/* signal worker to stop test */
 	wrk_cmd = WRK_CMD_STOP;
-	rte_smp_wmb();

 	/* wait for workers and collect stats. */
 	mc = rte_lcore_id();
--
2.17.1
  
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help