Re: [PATCH] rds: fix use-after-free read in rds_find_bound
From: santosh.shilimkar@oracle.com <hidden>
Date: 2017-12-31 22:30:07
On 12/31/17 4:33 AM, Sowmini Varadhan wrote:
On (12/30/17 21:09), santosh.shilimkar@oracle.com wrote:quoted
Right. This was loop transport in action so xmit will just flip the direction with receive. And rds_recv_incoming() can race with socket_release. rds_find_bound() is suppose to add ref count on socket for rds_recv_incoming() but by that time socket is DEAD & freed by socket release callback.correct, that makes sense.
Yea. In fact the earlier point of sk being null and rs not is
also possible because socket release explicitly marks its
NULL ("sock->sk = NULL"). But it just side effect of the actual
race.
quoted
And rds_release is suppose to be synced with rs_recv_lock. But release callback is marking the sk orphan before syncing up with receive path and updating the bind table. Probably it can pushed down further after the socket clean up buut need to think bit more.However, I'm not sure this seals the race.. according to the bug report rds_recv_incoming->rds_find_bound is being called in rds_send_worker context and the rds_find_bound code is 63 rs = rhashtable_lookup_fast(&bind_hash_table, &key, ht_parms); 64 if (rs && !sock_flag(rds_rs_to_sk(rs), SOCK_DEAD)) 65 rds_sock_addref(rs); 66 else 67 rs = NULL; 68 Since the entire logic of rds_release can interleave between line 63 and 64, (whereas we only addref at line 65), moving the sock_orphan will not help. I see that there was an explicic synchornization via the bucket->lock before 7b5654349e. I think you need something like that, or some type or rcu-based hash list.
The rhashtable already has internal bucket lock so those operation like add/remove are synced up. But yes reference addition can still race with receive since receive lock is taken after find bound.
Patch below may make race-window smaller, but race window is still there. If the receive lock is taken ahead then with sock_orphan
change socket release will get synchronized with receive. But preventing socket release to be getting triggered while in receive path by means ref count is better option. Moving socket_orphan down is anyway a good change but its not enough. Will think bit more about it. Thanks for the good discussion. Regards, Santosh