Thread (45 messages) 45 messages, 5 authors, 2018-10-10

Re: [PATCH v2 02/10] kni: separate releasing netdev from freeing KNI interface

From: Ferruh Yigit <hidden>
Date: 2018-08-29 10:59:17

On 6/29/2018 2:55 AM, Dan Gora wrote:
Currently the rte_kni kernel driver suffers from a problem where
when the interface is released, it generates a callback to the DPDK
application to change the interface state to Down.  However, after the
DPDK application handles the callback and generates a response back to
the kernel, the rte_kni driver cannot wake the thread which is asleep
waiting for the response, because it is holding the kni_link_lock
semaphore and it has already removed the 'struct kni_dev' from the
list of interfaces to poll for responses.

This means that if the KNI interface is in the Up state when
rte_kni_release() is called, it will always sleep for three seconds
until kni_net_release gives up waiting for a response from the DPDK
application.

To fix this, we must separate the step to release the kernel network
interface from the steps to remove the KNI interface from the list
of interfaces to poll.

When the kernel network interface is removed with unregister_netdev(),
if the interface is up, it will generate a callback to mark the
interface down, which calls kni_net_release().  kni_net_release() will
block waiting for the DPDK application to call rte_kni_handle_request()
to handle the callback, but it also needs the thread in the KNI driver
(either the per-dev thread for multi-thread or the per-driver thread)
to call kni_net_poll_resp() in order to wake the thread sleeping in
kni_net_release (actually kni_net_process_request()).

So now, KNI interfaces should be removed as such:

1) The user calls rte_kni_release().  This only unregisters the
netdev in the kernel, but touches nothing else.  This allows all the
threads to run which are necessary to handle the callback into the
DPDK application to mark the interface down.

2) The user stops the thread running rte_kni_handle_request().
After rte_kni_release() has been called, there will be no more
callbacks for that interface so it is not necessary.  It cannot be
running at the same time that rte_kni_free() frees all of the FIFOs
and DPDK memory for that KNI interface.

3) The user calls rte_kni_free().  This performs the RTE_KNI_IOCTL_FREE
ioctl which calls kni_ioctl_free().  This function removes the struct
kni_dev from the list of interfaces to poll (and kills the per-dev
kthread, if configured for multi-thread), then frees the memory in
the FIFOs.

Signed-off-by: Dan Gora <redacted>

You are right, that problem exits.
Although I don't see problem related to holding the kni_list_lock, polling
thread terminated before unregister interface cause the problem.

And it has a reason to terminate polling thread first, because it uses device
resources.

Separating unregister and free steps looks good, but I am not sure if this
should be reflected to the user, with a new ioctl and API.
When user done with interface it calls rte_kni_release() to release them, does
user really need a rte_kni_free() API or need to know the difference of two, is
there any action to take in userspace between these two APIs? I think no.

What about keeping single rte_kni_release() API and solve the issue internally
in KNI?

Previously it was doing:
- Stop threads (also there is another single/multi thread error [1])
- kni_dev_remove()
	- unregister and free netdev() [2]
	- kni_net_release_fifo_phy() [3]

Instead internally can we do:
a- Unregister kernel interfaces, rte_kni_unregister()?
b- stop threads
c- kni_net_release_fifo_phy
d- free netdev

The challenge I can see is some time required between a) and b) to let userspace
app to response, we need a way to know response received before stopping the thread.

Another thing is there are two release path, kni_release() and
kni_ioctl_release() both should be fixed.



[1]
If multi thread enabled they have been stopped, but if single thread used it has
not been stopped that is why you don't see the 3 seconds delay for default
single thread case, but not stopping the polling thread but removing the
interface is wrong.

[2]
unregistering netdev will trigger a userspace request but response won't be read
because polling thread also polls the response queue, and that thread is already
stopped at this stage.

[3]
This is also wrong as you have pointed in later patch in your series,
kni_net_release_fifo_phy() moves packets from rxq/alloq queue to free queue,
queues are still allocated but the references kept in kernel may be invalid at
this stage because of free netdev()
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help