Thread (158 messages) 158 messages, 5 authors, 2024-12-02

Re: [PATCH net-next v11 09/23] ovpn: implement basic RX path (UDP)

From: Sabrina Dubroca <sd@queasysnail.net>
Date: 2024-11-29 13:20:25
Also in: linux-kselftest, lkml

2024-11-27, 02:40:02 +0100, Antonio Quartulli wrote:
On 26/11/2024 09:49, Antonio Quartulli wrote:
[...]
quoted
quoted
The potential issue is tricky since we create it patch-by-patch.

Up to this patch the socket releasing procedure looks solid and
reliable. E.g. the P2P netdev destroying:

   ovpn_netdev_notifier_call(NETDEV_UNREGISTER)
     ovpn_peer_release_p2p
       ovpn_peer_del_p2p
         ovpn_peer_put
           ovpn_peer_release_kref
             ovpn_peer_release
               ovpn_socket_put
                 ovpn_socket_release_kref
                   ovpn_socket_detach
                     ovpn_udp_socket_detach
                       setup_udp_tunnel_sock
   netdev_run_todo
     rcu_barrier  <- no running ovpn_udp_encap_recv after this point
     free_netdev

After the setup_udp_tunnel_sock() call no new ovpn_udp_encap_recv()
will be spawned. And after the rcu_barrier() all running
ovpn_udp_encap_recv() will be done. All good.
ok
quoted
Then, the following patch 'ovpn: implement TCP transport' disjoin
ovpn_socket_release_kref() and ovpn_socket_detach() by scheduling
the socket detach function call:

   ovpn_socket_release_kref
     ovpn_socket_schedule_release
       schedule_work(&sock->work)

And long time after the socket will be actually detached:

   ovpn_socket_release_work
     ovpn_socket_detach
       ovpn_udp_socket_detach
         setup_udp_tunnel_sock

And until this detaching will take a place, UDP handler can call
ovpn_udp_encap_recv() whatever number of times.

So, we can end up with this scenario:

   ovpn_netdev_notifier_call(NETDEV_UNREGISTER)
     ovpn_peer_release_p2p
       ovpn_peer_del_p2p
         ovpn_peer_put
           ovpn_peer_release_kref
             ovpn_peer_release
               ovpn_socket_put
                 ovpn_socket_release_kref
                   ovpn_socket_schedule_release
                     schedule_work(&sock->work)
   netdev_run_todo
     rcu_barrier
     free_netdev

   ovpn_udp_encap_recv  <- called for an incoming UDP packet
     ovpn_from_udp_sock <- returns pointer to freed memory
     // Any access to ovpn pointer is the use-after-free

   ovpn_socket_release_work  <- kernel finally ivoke the work
     ovpn_socket_detach
       ovpn_udp_socket_detach
         setup_udp_tunnel_sock

To address the issue, I see two possible solutions:
1. flush the workqueue somewhere before the netdev release
yes! This is what I was missing. This will also solve the "how can the
module wait for all workers to be done before unloading?"
Actually there might be even a simpler solution: each ovpn_socket will hold
a reference to an ovpn_peer (TCP) or to an ovpn_priv (UDP).
I can simply increase the refcounter those objects while they are referenced
by the socket and decrease it when the socket is fully released (in the
detach() function called by the worker).

This way the netdev cannot be released until all socket (and all peers) are
gone.

This approach doesn't require any local workqueue or any other special
coordination as we'll just force the whole cleanup to happen in a specific
order.

Does it make sense?
This dependency between refcounts worries me. I'm already having a
hard time remembering how all objects interact together.

And since ovpn_peer_release already calls ovpn_socket_put, you'd get a
refcount loop if ovpn_socket now also has a ref on the peer, no?

-- 
Sabrina
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help