Re: [dpdk-dev] [RFC 3/5] eal: lcore state FINISHED is not required

From: Honnappa Nagarahalli <hidden>
Date: 2021-03-02 03:13:41

Possibly related (same subject, not in this thread)

2021-02-24 · [dpdk-dev] [RFC 3/5] eal: lcore state FINISHED is not required · Honnappa Nagarahalli <hidden>

<snip>

quoted

Subject: [RFC 3/5] eal: lcore state FINISHED is not required

FINISHED state seems to be used to indicate that the worker's
update of the 'state' is not visible to other threads. There seems
to be no requirement to have such a state.

I am not sure "FINISHED" is necessary to be removed, and I propose
some of my profiles for discussion.
 There are three states for lcore now:
"WAIT": indicate lcore can start working
"RUNNING": indicate lcore is working
"FINISHED": indicate lcore has finished its working and wait to be
reset

If you look at the definitions of "WAIT" and "FINISHED" states, they look

similar, except for "wait to be reset" in "FINISHED" state . The code really does
not do anything to reset the lcore. It just changes the state to "WAIT".

quoted

From the description above, we can find "FINISHED" is different from
"WAIT", it can shows that lcore has done the work and finished it.
Thus, if we remove "FINISHED", maybe we will not know whether the
lcore finishes its work or just doesn't start, because this two state has the

same tag "WAIT".

quoted

Looking at "eal_thread_loop", the worker thread sets the state to "RUNNING"

before sending the ack back to main core. After that it is guaranteed that the
worker will run the assigned function. Only case where it will not run the
assigned function is when the 'write' syscall fails, in which case it results in a
panic.

Quick note: it should not panic.
We must find a way to return an error
without crashing the whole application.

The syscalls are being used to communicate the status back to the main thread. If they fail, it is not possible to communicate the status. May be it is better to panic.
We could change the implementation using shared variables, but it would require polling the memory. May be the syscalls are being used to avoid polling. However, this polling would happen during init time (or similar) for a short duration.

quoted

Furthermore, consider such a scenario:
Core 1 need to monitor Core 2 state, if Core 2 finishes one task,
Core 1 can start its working.
However, if there is only  one tag "WAIT", Core 1 maybe  start its
work at the wrong time, when Core 2 still does not start its task at state

"WAIT".

quoted

This is just my guess, and at present, there is no similar
application scenario in dpdk.

To be able to do this effectively, core 1 needs to observe the state change

from WAIT->RUNNING->FINISHED. This requires that core 1 should be calling
rte_eal_remote_launch and rte_eal_wait_lcore functions. It is not possible to
observe this state transition from a 3rd core (for ex: a worker might go from
RUNNING->FINISHED->WAIT->RUNNING which a 3rd core might not be able to
observe).

quoted

On the other hand, if we decide to remove "FINISHED", please
consider the following files:
1. lib/librte_eal/linux/eal_thread.c: line 31
    lib/librte_eal/windows/eal_thread.c: line 22
    lib/librte_eal/freebsd/eal_thread.c: line 31

I have looked at these lines, they do not capture "why" FINISHED state is

required.

quoted

2.

quoted

lib/librte_eal/include/rte_launch.h: line 24, 44, 121, 123, 131 3.
examples/l2fwd-
keepalive/main.c: line 510
rte_eal_wait_lcore(id_core) can be removed. Because the core state
has been checked as "WAIT", this is a redundant operation

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help