Thread (12 messages) 12 messages, 2 authors, 2021-01-18

Re: [PATCH net-next v6 3/3] net: add sysfs attribute to control napi threaded mode

From: Wei Wang <hidden>
Date: 2021-01-18 20:00:00

On Fri, Jan 15, 2021 at 6:18 PM Alexander Duyck
[off-list ref] wrote:
On Fri, Jan 15, 2021 at 4:44 PM Wei Wang [off-list ref] wrote:
quoted
On Fri, Jan 15, 2021 at 3:08 PM Alexander Duyck
[off-list ref] wrote:
quoted
On Fri, Jan 15, 2021 at 1:54 PM Wei Wang [off-list ref] wrote:
quoted
On Thu, Jan 14, 2021 at 7:34 PM Alexander Duyck
[off-list ref] wrote:
quoted
On Thu, Jan 14, 2021 at 4:34 PM Wei Wang [off-list ref] wrote:
quoted
This patch adds a new sysfs attribute to the network device class.
Said attribute provides a per-device control to enable/disable the
threaded mode for all the napi instances of the given network device.

Co-developed-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Co-developed-by: Hannes Frederic Sowa <redacted>
Signed-off-by: Hannes Frederic Sowa <redacted>
Co-developed-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: Wei Wang <redacted>
---
 include/linux/netdevice.h |  2 ++
 net/core/dev.c            | 28 +++++++++++++++++
 net/core/net-sysfs.c      | 63 +++++++++++++++++++++++++++++++++++++++
 3 files changed, 93 insertions(+)
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index c24ed232c746..11ae0c9b9350 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -497,6 +497,8 @@ static inline bool napi_complete(struct napi_struct *n)
        return napi_complete_done(n, 0);
 }

+int dev_set_threaded(struct net_device *dev, bool threaded);
+
 /**
  *     napi_disable - prevent NAPI from scheduling
  *     @n: NAPI context
diff --git a/net/core/dev.c b/net/core/dev.c
index edcfec1361e9..d5fb95316ea8 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -6754,6 +6754,34 @@ static int napi_set_threaded(struct napi_struct *n, bool threaded)
        return err;
 }

+static void dev_disable_threaded_all(struct net_device *dev)
+{
+       struct napi_struct *napi;
+
+       list_for_each_entry(napi, &dev->napi_list, dev_list)
+               napi_set_threaded(napi, false);
+}
+
+int dev_set_threaded(struct net_device *dev, bool threaded)
+{
+       struct napi_struct *napi;
+       int ret;
+
+       dev->threaded = threaded;
+       list_for_each_entry(napi, &dev->napi_list, dev_list) {
+               ret = napi_set_threaded(napi, threaded);
+               if (ret) {
+                       /* Error occurred on one of the napi,
+                        * reset threaded mode on all napi.
+                        */
+                       dev_disable_threaded_all(dev);
+                       break;
+               }
+       }
+
+       return ret;
This doesn't seem right. The NAPI instances can be active while this
is occuring can they not? I would think at a minimum you need to go
through a napi_disable/napi_enable in order to toggle this value for
each NAPI instance. Otherwise aren't you at risk for racing and having
a napi_schedule attempt to wake up the thread before it has been
allocated?

Yes. The napi instance could be active when this occurs. And I think
it is OK. It is cause napi_set_threaded() only sets
NAPI_STATE_THREADED bit after successfully created the kthread. And
___napi_schedule() only tries to wake up the kthread after testing the
THREADED bit.
But what do you have guaranteeing that the kthread has been written to
memory? That is what I was getting at. Just because you have written
the value doesn't mean it is in memory yet so you would probably need
an smb_mb__before_atomic() barrier call before you set the bit.
Noted. Will look into this.
quoted
Also I am not sure it is entirely safe to have the instance polling
while you are doing this. That is why I am thinking if the instance is
enabled then a napi_disable/napi_enable would be preferable.
When the napi is actively being polled in threaded mode, we will keep
rescheduling the kthread and calling __napi_poll() until
NAPI_SCHED_STATE is cleared by napi_complete_done(). And during the
next time napi_schedule() is called, we re-evaluate
NAPI_STATE_THREADED bit to see if we should wake up kthread, or
generate softirq.
And for the other way around, if napi is being handled during
net_rx_action(), toggling the bit won't cause immediate wake-up of the
kthread, but will wait for NAPI_SCHED_STATE to be cleared, and the
next time napi_schedule() is called.
I think it is OK. WDYT?
It is hard to say. The one spot that gives me a bit of concern is the
NAPIF_STATE_MISSED case in napi_complete_done. It is essentially would
become a switchover point between the two while we are actively
polling inside the driver. You end up with NAPI_SCHED_STATE not being
toggled but jumping from one to the other.
Hmm.. Right. That is the one case where NAPI_SCHED_STATE will not be
toggled, but could potentially change the processing mode.
But still, I don't see any race in this case. The napi instance will
still either be processed in softirq mode by net_rx_action(), or in
the kthread mode, after napi_complete_done() calls __napi_schedule().
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help