Thread (12 messages) 12 messages, 2 authors, 2019-06-29

Re: [PATCH 1/2] tls: remove close callback sock unlock/lock and flush_sync

From: John Fastabend <john.fastabend@gmail.com>
Date: 2019-06-29 00:20:33
Also in: bpf

Jakub Kicinski wrote:
On Fri, 28 Jun 2019 12:40:29 -0700, John Fastabend wrote:
quoted
The lock() is already held when entering unhash() side so need to
handle this case as well,

CPU 0 (free)          CPU 1 (wq)

lock(sk)              ctx = tls_get_ctx(sk) <- need to be check null ptr
sk_prot->unhash()
  set_bit()
  cancel_work()
  ...
  kfree(ctx)
unlock(sk)

but using cancel and doing an unlikely(!ctx) check should be
sufficient to handle wq. 
I'm not sure we can kfree ctx, the work struct itself is in it, no?
should be OK as long as we have no outstanding work. So cancel_work()
needs to be cancel_work_sync() but we can't do this in unhash so
see below...
quoted
What I'm not sure how to solve now is
in patch 2 of this series unhash is still calling strp_done
with the sock lock. Maybe we need to do a deferred release
like sockmap side?
Right, we can't do anything that sleeps in unhash, since we're holding
the spinlock there, not the "owner" lock.
yep.
quoted
Trying to drop the lock and then grabbing it again doesn't
seem right to me seems based on comment in tcp_abort we
could potentially "race with userspace socket closes such
as tcp_close". iirc I think one of the tls splats from syzbot
looked something like this may have happened.

For now I'm considering adding a strp_cancel() op. Seeing
we are closing() the socket and tearkng down we can probably
be OK with throwing out strp results.
But don't we have to flush the work queues before we free ctx?  We'd
need to alloc a workqueue and schedule a work to flush the other works
and then free?
Agree, just wrote this code now and testing it. So new flow is,

 lock(sk)
 sk_prot->unhash()
   set_bit(CLOSING)
   ...
   tls_ctx_wq_free(ctx) <- sets up a work queue to do the *sync and kfree
 unlock(sk)

FWIW sockmap has a similar problem on unhash() where it needs to
wait a RCU grace period and then do *sync operations on workqueues.
So it also schedules work to do the cleanup outside unhash() with
additional complication of waiting rcu grace period before scheduling.

The new patches are not too ugly IMO it only impacts this one
unhash() case.
Why can't tls sockets exist outside of established state?  If shutdown
doesn't call close, perhaps we can add a shutdown callback?  It doesn't
seem to be called from BH?
Because the ulp would be shared in this case,

	/* The TLS ulp is currently supported only for TCP sockets
	 * in ESTABLISHED state.
	 * Supporting sockets in LISTEN state will require us
	 * to modify the accept implementation to clone rather then
	 * share the ulp context.
	 */
	if (sk->sk_state != TCP_ESTABLISHED)
		return -ENOTSUPP;

In general I was trying to avoid modifying core TCP layer to fix
this corner case in tls.
 
Sorry for all the questions, I'm not really able to fully wrap my head
around this. I also feel like I'm missing the sockmap piece that may
be why you prefer unhash over disconnect.
Yep, if we try to support listening sockets we need a some more
core infrastructure to push around ulp and user_data portions of
sockets. Its not going to be nice for stable. Also at least in TLS
and sockmap case its not really needed for any use case I know
of.
FWIW Davide's ULP diag support patches will require us to most likely
free ctx with kfree_rcu(). diag only has a ref on struct sock, so if 
we want to access ctx we need RCU or to lock every socket. It's a
little bit of an abuse of RCU, because the data under our feet may
actually change, but the fields we dump will only get inited once
after ulp is installed.
Ah great. I'll push a v2 this afternoon so we get syzbot running
its reproducers over it and maybe it will fit into Davide's work
as well.
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help