Re: [PATCH net-next v6 3/3] net: add sysfs attribute to control napi threaded mode
From: Wei Wang <hidden>
Date: 2021-01-18 20:00:00
On Fri, Jan 15, 2021 at 6:18 PM Alexander Duyck [off-list ref] wrote:
On Fri, Jan 15, 2021 at 4:44 PM Wei Wang [off-list ref] wrote:quoted
On Fri, Jan 15, 2021 at 3:08 PM Alexander Duyck [off-list ref] wrote:quoted
On Fri, Jan 15, 2021 at 1:54 PM Wei Wang [off-list ref] wrote:quoted
On Thu, Jan 14, 2021 at 7:34 PM Alexander Duyck [off-list ref] wrote:quoted
On Thu, Jan 14, 2021 at 4:34 PM Wei Wang [off-list ref] wrote:quoted
This patch adds a new sysfs attribute to the network device class. Said attribute provides a per-device control to enable/disable the threaded mode for all the napi instances of the given network device. Co-developed-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Co-developed-by: Hannes Frederic Sowa <redacted> Signed-off-by: Hannes Frederic Sowa <redacted> Co-developed-by: Felix Fietkau <nbd@nbd.name> Signed-off-by: Felix Fietkau <nbd@nbd.name> Signed-off-by: Wei Wang <redacted> --- include/linux/netdevice.h | 2 ++ net/core/dev.c | 28 +++++++++++++++++ net/core/net-sysfs.c | 63 +++++++++++++++++++++++++++++++++++++++ 3 files changed, 93 insertions(+)diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index c24ed232c746..11ae0c9b9350 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h@@ -497,6 +497,8 @@ static inline bool napi_complete(struct napi_struct *n) return napi_complete_done(n, 0); } +int dev_set_threaded(struct net_device *dev, bool threaded); + /** * napi_disable - prevent NAPI from scheduling * @n: NAPI contextdiff --git a/net/core/dev.c b/net/core/dev.c index edcfec1361e9..d5fb95316ea8 100644 --- a/net/core/dev.c +++ b/net/core/dev.c@@ -6754,6 +6754,34 @@ static int napi_set_threaded(struct napi_struct *n, bool threaded) return err; } +static void dev_disable_threaded_all(struct net_device *dev) +{ + struct napi_struct *napi; + + list_for_each_entry(napi, &dev->napi_list, dev_list) + napi_set_threaded(napi, false); +} + +int dev_set_threaded(struct net_device *dev, bool threaded) +{ + struct napi_struct *napi; + int ret; + + dev->threaded = threaded; + list_for_each_entry(napi, &dev->napi_list, dev_list) { + ret = napi_set_threaded(napi, threaded); + if (ret) { + /* Error occurred on one of the napi, + * reset threaded mode on all napi. + */ + dev_disable_threaded_all(dev); + break; + } + } + + return ret;This doesn't seem right. The NAPI instances can be active while this is occuring can they not? I would think at a minimum you need to go through a napi_disable/napi_enable in order to toggle this value for each NAPI instance. Otherwise aren't you at risk for racing and having a napi_schedule attempt to wake up the thread before it has been allocated?Yes. The napi instance could be active when this occurs. And I think it is OK. It is cause napi_set_threaded() only sets NAPI_STATE_THREADED bit after successfully created the kthread. And ___napi_schedule() only tries to wake up the kthread after testing the THREADED bit.But what do you have guaranteeing that the kthread has been written to memory? That is what I was getting at. Just because you have written the value doesn't mean it is in memory yet so you would probably need an smb_mb__before_atomic() barrier call before you set the bit.Noted. Will look into this.quoted
Also I am not sure it is entirely safe to have the instance polling while you are doing this. That is why I am thinking if the instance is enabled then a napi_disable/napi_enable would be preferable.When the napi is actively being polled in threaded mode, we will keep rescheduling the kthread and calling __napi_poll() until NAPI_SCHED_STATE is cleared by napi_complete_done(). And during the next time napi_schedule() is called, we re-evaluate NAPI_STATE_THREADED bit to see if we should wake up kthread, or generate softirq. And for the other way around, if napi is being handled during net_rx_action(), toggling the bit won't cause immediate wake-up of the kthread, but will wait for NAPI_SCHED_STATE to be cleared, and the next time napi_schedule() is called. I think it is OK. WDYT?It is hard to say. The one spot that gives me a bit of concern is the NAPIF_STATE_MISSED case in napi_complete_done. It is essentially would become a switchover point between the two while we are actively polling inside the driver. You end up with NAPI_SCHED_STATE not being toggled but jumping from one to the other.
Hmm.. Right. That is the one case where NAPI_SCHED_STATE will not be toggled, but could potentially change the processing mode. But still, I don't see any race in this case. The napi instance will still either be processed in softirq mode by net_rx_action(), or in the kthread mode, after napi_complete_done() calls __napi_schedule().