Thread (7 messages) 7 messages, 4 authors, 2021-03-30

Re: [dpdk-dev] [RFC 3/5] eal: lcore state FINISHED is not required

From: Honnappa Nagarahalli <hidden>
Date: 2021-03-30 02:54:35

Possibly related (same subject, not in this thread)

<snip>
Hi everyone,
Thanks Konstantin for the review.
quoted
<snip>
quoted
quoted
quoted
quoted
Subject: [RFC 3/5] eal: lcore state FINISHED is not required

FINISHED state seems to be used to indicate that the worker's
update of the 'state' is not visible to other threads. There
seems to be no requirement to have such a state.
I am not sure "FINISHED" is necessary to be removed, and I
propose some of my profiles for discussion.
 There are three states for lcore now:
"WAIT": indicate lcore can start working
"RUNNING": indicate lcore is working
"FINISHED": indicate lcore has finished its working and wait to
be reset
If you look at the definitions of "WAIT" and "FINISHED" states,
they look
similar, except for "wait to be reset" in "FINISHED" state . The
code really does not do anything to reset the lcore. It just changes the
state to "WAIT".


I agree that 3 states here seems excessive.
Just 2 (RUNNING/IDLE) seems enough.
Though we can't just remove FINISHED here - it will be an Abi breakage.
Might be deprecate FINISHED now and remove in 21.11.
Agree, will add a new patch to deprecate the FINISHED state. Also, does the deprecation notice need to go into 20.08 release notes?
Also need to decide what rte_eal_wait_lcore() should return in that case?
Always zero, or always status of last function called?
I am not sure why ' rte_eal_wait_lcore' has the following code:

if (lcore_config[worker_id].state == WAIT)
                return 0;

This indicates that the caller has called 'rte_eal_wait_lcore' function earlier. May be there is a use case where there are multiple threads waiting for the lcores to complete?
Anyway, IMO, returning the status of the last function always is better for this API.
quoted
quoted
quoted
quoted
From the description above, we can find "FINISHED" is different
from "WAIT", it can shows that lcore has done the work and finished
it.
quoted
quoted
quoted
quoted
Thus, if we remove "FINISHED", maybe we will not know whether
the lcore finishes its work or just doesn't start, because this
two state has the
same tag "WAIT".
quoted
Looking at "eal_thread_loop", the worker thread sets the state to
"RUNNING"
quoted
quoted
before sending the ack back to main core. After that it is
guaranteed that the worker will run the assigned function. Only case
where it will not run the assigned function is when the 'write'
syscall fails, in which case it results in a panic.

Quick note: it should not panic.
We must find a way to return an error without crashing the whole
application.
The syscalls are being used to communicate the status back to the main
thread. If they fail, it is not possible to communicate the status.
quoted
May be it is better to panic.
We could change the implementation using shared variables, but it
would require polling the memory. May be the syscalls are being used to
avoid polling. However, this polling would happen during init time (or similar)
for a short duration.

AFAIK we use read and write not for status communication, but sort of
sleep/ack point.
Though I agree if we can't do read/write from the system pipe then
something is totally wrong, and probably there is no much point to continue.
quoted
quoted
quoted
quoted
Furthermore, consider such a scenario:
Core 1 need to monitor Core 2 state, if Core 2 finishes one
task, Core 1 can start its working.
However, if there is only  one tag "WAIT", Core 1 maybe  start
its work at the wrong time, when Core 2 still does not start its
task at state
"WAIT".
quoted
quoted
This is just my guess, and at present, there is no similar
application scenario in dpdk.
To be able to do this effectively, core 1 needs to observe the
state change
from WAIT->RUNNING->FINISHED. This requires that core 1 should be
calling rte_eal_remote_launch and rte_eal_wait_lcore functions. It
is not possible to observe this state transition from a 3rd core
(for ex: a worker might go from
RUNNING->FINISHED->WAIT->RUNNING which a 3rd core might not be
able
quoted
quoted
RUNNING->FINISHED->WAIT->to
observe).
quoted
quoted
On the other hand, if we decide to remove "FINISHED", please
consider the following files:
1. lib/librte_eal/linux/eal_thread.c: line 31
    lib/librte_eal/windows/eal_thread.c: line 22
    lib/librte_eal/freebsd/eal_thread.c: line 31
I have looked at these lines, they do not capture "why" FINISHED
state is
required.
quoted
 2.
quoted
lib/librte_eal/include/rte_launch.h: line 24, 44, 121, 123, 131 3.
examples/l2fwd-
keepalive/main.c: line 510
rte_eal_wait_lcore(id_core) can be removed. Because the core
state has been checked as "WAIT", this is a redundant operation
  
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help