RE: [EXTERNAL] Re: [PATCH net,v2] hv_netvsc: Switch VF namespace in netvsc_open instead
From: Haiyang Zhang <haiyangz@microsoft.com>
Date: 2025-07-15 15:30:49
Also in:
linux-hyperv, lkml, stable
-----Original Message----- From: Simon Horman <horms@kernel.org> Sent: Tuesday, July 15, 2025 9:06 AM To: Haiyang Zhang <redacted> Cc: linux-hyperv@vger.kernel.org; netdev@vger.kernel.org; Haiyang Zhang [off-list ref]; KY Srinivasan [off-list ref]; wei.liu@kernel.org; Dexuan Cui [off-list ref]; andrew+netdev@lunn.ch; edumazet@google.com; kuba@kernel.org; pabeni@redhat.com; davem@davemloft.net; linux-kernel@vger.kernel.org; stable@vger.kernel.org; cavery@redhat.com Subject: [EXTERNAL] Re: [PATCH net,v2] hv_netvsc: Switch VF namespace in netvsc_open instead On Mon, Jul 14, 2025 at 09:41:37AM -0700, Haiyang Zhang wrote:quoted
From: Haiyang Zhang <haiyangz@microsoft.com> The existing code move the VF NIC to new namespace when NETDEV_REGISTERisquoted
received on netvsc NIC. During deletion of the namespace, default_device_exit_batch() >> default_device_exit_net() is called. When netvsc NIC is moved back and registered to the default namespace, it automatically brings VF NIC back to the default namespace. This willcausequoted
the default_device_exit_net() >> for_each_netdev_safe loop unable todetectquoted
the list end, and hit NULL ptr: [ 231.449420] mana 7870:00:00.0 enP30832s1: Moved VF to namespace with:eth0quoted
[ 231.449656] BUG: kernel NULL pointer dereference, address:0000000000000010quoted
[ 231.450246] #PF: supervisor read access in kernel mode [ 231.450579] #PF: error_code(0x0000) - not-present page [ 231.450916] PGD 17b8a8067 P4D 0 [ 231.451163] Oops: Oops: 0000 [#1] SMP NOPTI [ 231.451450] CPU: 82 UID: 0 PID: 1394 Comm: kworker/u768:1 Not tainted6.16.0-rc4+ #3 VOLUNTARYquoted
[ 231.452042] Hardware name: Microsoft Corporation VirtualMachine/Virtual Machine, BIOS Hyper-V UEFI Release v4.1 11/21/2024quoted
[ 231.452692] Workqueue: netns cleanup_net [ 231.452947] RIP: 0010:default_device_exit_batch+0x16c/0x3f0 [ 231.453326] Code: c0 0c f5 b3 e8 d5 db fe ff 48 85 c0 74 15 48 c7 c2f8 fd ca b2 be 10 00 00 00 48 8d 7d c0 e8 7b 77 25 00 49 8b 86 28 01 00 00 <48> 8b 50 10 4c 8b 2a 4c 8d 62 f0 49 83 ed 10 4c 39 e0 0f 84 d6 00quoted
[ 231.454294] RSP: 0018:ff75fc7c9bf9fd00 EFLAGS: 00010246 [ 231.454610] RAX: 0000000000000000 RBX: 0000000000000002 RCX:61c8864680b583ebquoted
[ 231.455094] RDX: ff1fa9f71462d800 RSI: ff75fc7c9bf9fd38 RDI:0000000030766564quoted
[ 231.455686] RBP: ff75fc7c9bf9fd78 R08: 0000000000000000 R09:0000000000000000quoted
[ 231.456126] R10: 0000000000000001 R11: 0000000000000004 R12:ff1fa9f70088e340quoted
[ 231.456621] R13: ff1fa9f70088e340 R14: ffffffffb3f50c20 R15:ff1fa9f7103e6340quoted
[ 231.457161] FS: 0000000000000000(0000) GS:ff1faa6783a08000(0000)knlGS:0000000000000000quoted
[ 231.457707] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 231.458031] CR2: 0000000000000010 CR3: 0000000179ab2006 CR4:0000000000b73ef0quoted
[ 231.458434] Call Trace: [ 231.458600] <TASK> [ 231.458777] ops_undo_list+0x100/0x220 [ 231.459015] cleanup_net+0x1b8/0x300 [ 231.459285] process_one_work+0x184/0x340 To fix it, move the VF namespace switching code from the NETDEV_REGISTER event handler to netvsc_open(). Cc: stable@vger.kernel.org Cc: cavery@redhat.com Fixes: 4c262801ea60 ("hv_netvsc: Fix VF namespace also in synthetic NICNETDEV_REGISTER event")quoted
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>With this change do we go back to the situation that existed prior to the cited patch? Quoting the cited commit: The existing code moves VF to the same namespace as the synthetic NIC during netvsc_register_vf(). But, if the synthetic device is moved to a new namespace after the VF registration, the VF won't be moved together. Or perhaps not because if synthetic device is moved then, in practice, it will subsequently be reopened? (Because it is closed as part of the move to a different netns?)
There are two cases: 1) the synthetic device is moved to a new namespace before the VF device is offered from PCI: During netvsc_register_vf() >> dev_change_net_namespace() will put VF to the same namespace. 2) the synthetic device is moved to a new namespace after the VF device is offered from PCI: The commit 4c262801ea60 does the move in netvsc_event_set_vf_ns >> dev_change_net_namespace(). But it will cause Null ptr error during namespace deletion >> default_device_exit_net(). This patch keeps the code path (1) unchanged, and fix the code path (2). And yes, __dev_change_net_namespace() >> netif_close(dev), so in the new namespace the NIC always needs to be re-opened before using. Thanks, - Haiyang