Re: [RESEND 2/2] sctp: hold cached endpoints to prevent possible UAF

From: Xin Long <lucien.xin@gmail.com>
Date: 2021-12-21 19:52:39
Also in: linux-sctp, lkml

On Mon, Dec 20, 2021 at 3:56 AM Lee Jones [off-list ref] wrote:

On Sun, 19 Dec 2021, Xin Long wrote:

quoted

On Thu, Dec 16, 2021 at 2:03 PM Lee Jones [off-list ref] wrote:

quoted

On Thu, 16 Dec 2021, Xin Long wrote:

quoted

(

On Thu, Dec 16, 2021 at 1:12 PM Xin Long [off-list ref] wrote:

quoted

On Thu, Dec 16, 2021 at 12:14 PM Lee Jones [off-list ref] wrote:

quoted

On Thu, 16 Dec 2021, Lee Jones wrote:

quoted

On Thu, 16 Dec 2021, Xin Long wrote:

quoted

On Thu, Dec 16, 2021 at 11:39 AM Lee Jones [off-list ref] wrote:

quoted

On Thu, 16 Dec 2021, Xin Long wrote:

quoted

On Wed, Dec 15, 2021 at 8:48 PM Jakub Kicinski [off-list ref] wrote:

quoted

On Tue, 14 Dec 2021 21:57:32 +0000 Lee Jones wrote:

quoted

The cause of the resultant dump_stack() reported below is a
dereference of a freed pointer to 'struct sctp_endpoint' in
sctp_sock_dump().

This race condition occurs when a transport is cached into its
associated hash table followed by an endpoint/sock migration to a new
association in sctp_assoc_migrate() prior to their subsequent use in
sctp_diag_dump() which uses sctp_for_each_transport() to walk the hash
table calling into sctp_sock_dump() where the dereference occurs.

quoted

in sctp_sock_dump():
        struct sock *sk = ep->base.sk;
        ... <--[1]
        lock_sock(sk);

Do you mean in [1], the sk is peeled off and gets freed elsewhere?

'ep' and 'sk' are both switched out for new ones in sctp_sock_migrate().

quoted

if that's true, it's still late to do sock_hold(sk) in your this patch.

No, that's not right.

The schedule happens *inside* the lock_sock() call.

Sorry, I don't follow this.
We can't expect when the schedule happens, why do you think this
can never be scheduled before the lock_sock() call?

True, but I've had this running for hours and it hasn't reproduced.

I understand, but it's a crash, we shouldn't take any risk that it
will never happen.
you may try to add a usleep() before the lock_sock call to reproduce it.

quoted

Without this patch, I can reproduce this in around 2 seconds.

The C-repro for this is pretty intense!

If you want to be *sure* that a schedule will never happen, we can
take a reference directly with:

     ep = sctp_endpoint_hold(tsp->asoc->ep);
     sk = sock_hold(ep->base.sk);

Which was my original plan before I soak tested this submitted patch
for hours without any sign of reproducing the issue.

we tried to not export sctp_obj_hold/put(), that's why we had
sctp_for_each_transport().

ep itself holds a reference of sk when it's alive, so it's weird to do
these 2 together.

quoted

If the sock is peeled off or is being freed, we shouldn't dump this sock,
and it's better to skip it.

I guess we can do that too.

Are you suggesting sctp_sock_migrate() as the call site?

diff --git a/net/sctp/socket.c b/net/sctp/socket.c
index 85ac2e901ffc..56ea7a0e2add 100644
--- a/net/sctp/socket.c
+++ b/net/sctp/socket.c

@@ -9868,6 +9868,7 @@ static int sctp_sock_migrate(struct sock *oldsk,

struct sock *newsk,
                inet_sk_set_state(newsk, SCTP_SS_ESTABLISHED);
        }

+       sock_set_flag(oldsk, SOCK_RCU_FREE);
        release_sock(newsk);

        return 0;

SOCK_RCU_FREE is set to the previous sk, so that this sk will not
be freed between rcu_read_lock() and rcu_read_unlock().

quoted

Also, when are you planning on testing the flag?

SOCK_RCU_FREE flag is used when freeing sk in sk_destruct(),
and if it's set, it will be freed in the next grace period of RCU.

quoted

Won't that suffer with the same issue(s)?

diff --git a/net/sctp/diag.c b/net/sctp/diag.c
index 7970d786c4a2..b4c4acd9e67e 100644
--- a/net/sctp/diag.c
+++ b/net/sctp/diag.c

@@ -309,16 +309,21 @@ static int sctp_tsp_dump_one(struct

sctp_transport *tsp, void *p)

 static int sctp_sock_dump(struct sctp_transport *tsp, void *p)
 {
-       struct sctp_endpoint *ep = tsp->asoc->ep;
        struct sctp_comm_param *commp = p;
-       struct sock *sk = ep->base.sk;
        struct sk_buff *skb = commp->skb;
        struct netlink_callback *cb = commp->cb;
        const struct inet_diag_req_v2 *r = commp->r;
        struct sctp_association *assoc;
+       struct sctp_endpoint *ep;
+       struct sock *sk;
        int err = 0;

+       rcu_read_lock();
+       ep = tsp->asoc->ep;
+       sk = ep->base.sk;
        lock_sock(sk);

Unfortunately, this isn't going to work, as lock_sock() may sleep,
and is not allowed to be called understand rcu_read_lock() :(

Ah!

How about my original solution of taking:

  tsp->asoc->ep

... directly?

If it already holds the sk, we should be golden?

Both ep and sk could be destroyed at this moment.
you can't try to hold an object that has already been destroyed.
It holds the sk only when ep is still alive.

I don't see a way to get this fix with the current transport hashtable.
I will change to use port hashtable to dump sock/asocs for this.

Right.  Cache invalidation is hard!

Sure, if there is a better way, please go ahead.

Hi, Jones,

Port hashtable doesn't work either as lock_sock can not be called
under spin_lock().

I posted another patch where this issue can be fixed by moving ep free
to call_rcu().
It will be great if you are able to test it.

Thanks.

Thanks.

--
Lee Jones [李琼斯]
Senior Technical Lead - Developer Services
Linaro.org │ Open source software for Arm SoCs
Follow Linaro: Facebook | Twitter | Blog

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help