Thread (19 messages) 19 messages, 5 authors, 2021-04-14

[dpdk-dev] 回复: [PATCH v1 1/2] app/test: remove unnecessary barriers for ring stress test

From: Feifei Wang <hidden>
Date: 2021-02-01 09:08:14

Hi, everyone

Sorry for that there may be a problem in the e-mail format. 
Please see the picture according to the following link:
https://ibb.co/SQ7yGfW

Best Regards
Feifei
-----邮件原件-----
发件人: Feifei Wang
发送时间: 2021年2月1日 16:49
收件人: Honnappa Nagarahalli [off-list ref]; Stephen
Hemminger [off-list ref]
抄送: Ananyev, Konstantin [off-list ref];
dev@dpdk.org; nd [off-list ref]; Ruifeng Wang
[off-list ref]; nd [off-list ref]
主题: 回复: [dpdk-dev] [PATCH v1 1/2] app/test: remove unnecessary
barriers for ring stress test

Sorry, a mistake happens in the picture, after Wrk_cmd == WRK_CMD_RUN,
it should be a rmb rather than wmb.
quoted
-----邮件原件-----
发件人: Honnappa Nagarahalli [off-list ref]
发送时间: 2021年1月30日 9:24
收件人: Stephen Hemminger [off-list ref]
抄送: Ananyev, Konstantin [off-list ref]; Feifei Wang
[off-list ref]; dev@dpdk.org; nd [off-list ref]; Ruifeng
Wang
quoted
[off-list ref]; Honnappa Nagarahalli
[off-list ref]; nd [off-list ref]
主题: RE: [dpdk-dev] [PATCH v1 1/2] app/test: remove unnecessary
barriers for ring stress test

<snip>
quoted
quoted
quoted
quoted
quoted
Hi Feifei,
quoted
The variable "wrk_cmd" is a signal to control threads from
running and stopping. When worker lcores load "wrk_cmd ==
WRK_CMD_RUN",
quoted
quoted
quoted
they
start
quoted
running and when worker lcores load "wrk_cmd ==
WRK_CMD_STOP",
they
quoted
stop.

For the wmb in test_mt1, no storing operations must keep
the order after storing "wrk_cmd". Thus the wmb is
unnecessary.
quoted
quoted
quoted
quoted
quoted
quoted
I think there is a bug in my original code, we should do
smp_wmb()
*before* setting wrk_cmd, not after:

        /* launch on all workers */
        RTE_LCORE_FOREACH_WORKER(lc) {
                arg[lc].rng = r;
                arg[lc].stats = init_stat;
                rte_eal_remote_launch(test, &arg[lc], lc);
        }

        /* signal worker to start test */
+      rte_smp_wmb();
        wrk_cmd = WRK_CMD_RUN;
-       rte_smp_wmb();

        usleep(run_time * US_PER_S);


I still think we'd better have some synchronisation here.
Otherwise what would prevent compiler and/or cpu to update
wrk_cmd out of order (before _init_ phase is completed)?
We probably can safely assume no reordering from the
compiler here, as we have function calls straight before and
after 'wrk_cmd =
WRK_CMD_RUN;'
quoted
quoted
But for consistency and easier maintenance, I still think it
is better to have something here, after all it is not
performance critical
pass.
quoted
quoted
quoted
Agree that this is not performance critical.

This is more about correctness (as usually people refer to
code to understand the concepts). You can refer to video [1].
Essentially, the pthread_create has 'happens-before' behavior.
i.e. all the memory operations before the pthread_create are
visible to the new
thread.
quoted
quoted
quoted
The
rte_smp_rmb() barrier in the thread function is not required
as it reads the
data that was set before the thread was launched.

rte_eal_remote_launch() doesn't call pthread_create().
All it does -  updates global variable (lcore_config) and
writes/reads to/from the pipe.
Thanks for the reminder ☹
I think rte_eal_remote_launch and rte_eal_wait_lcore need to
provide
behavior similar to pthread_launch and pthread_join respectively.
quoted
There is use of rte_smp_*mb in those functions as well. Those need
to be fixed
first and then look at these.

Looks like you want __atomic_thread_fence() here.
In the rte_eal_remote_launch case, all the memory operations before
the API call need to be visible to the worker. If this is the only
requirement, we can use the function pointer as the guard variable and
use store-release. In the eal_thread_loop function we could do
load-acquire on the function pointer.

I do not think that there is a requirement to ensure that the memory
operations after the API call do not happen before the worker thread
starts running the function (As there is no guarantee on when the
worker thread will run. If the main thread needs to know if the worker
thread is running explicit hand-shaking needs to happen).

The rte_eal_wait_lcore API needs to ensure that the memory operations
in the worker are visible to the main. rte_eal_wait_lcore and
eal_thread_loop are synchronizing using lcore_config[worker_id].state.
I need to understand what else 'state' is used for. If there are no
issues, we can do a store-release on 'state' in eal_thread_loop and a load-
acquire in rte_eal_wait_lcore.
quoted
So, we do not have to use the __atomic_thread_fence.
  
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help