Re: [PATCH v5 0/2] tcp/dcpp: Un-pin tw_timer
From: Valentin Schneider <vschneid@redhat.com>
Date: 2024-05-21 09:03:44
Also in:
linux-rt-users, lkml
Hi, On 22/04/24 16:31, Valentin Schneider wrote:
Apologies for the delayed reply, I was away for most of last week; On 16/04/24 17:01, Eric Dumazet wrote:quoted
On Mon, Apr 15, 2024 at 4:33 PM Valentin Schneider [off-list ref] wrote:quoted
On 15/04/24 14:35, Eric Dumazet wrote:quoted
On Mon, Apr 15, 2024 at 1:34 PM Valentin Schneider [off-list ref] wrote:quoted
v4 -> v5 ++++++++ o Rebased against latest Linus' tree o Converted tw_timer into a delayed work following Jakub's bug report on v4 http://lore.kernel.org/r/20240411100536.224fa1e7@kernel.org (local)What was the issue again ? Please explain precisely why it was fundamentally tied to the use of timers (and this was not possible to fix the issue without adding work queues and more dependencies to TCP stack)In v4 I added the use of the ehash lock to serialize arming the timewait timer vs destroying it (inet_twsk_schedule() vs inet_twsk_deschedule_put()). Unfortunately, holding a lock both in a timer callback and in the context in which it is destroyed is invalid. AIUI the issue is as follows: CPUx CPUy spin_lock(foo); <timer fires> call_timer_fn() spin_lock(foo) // blocks timer_shutdown_sync() __timer_delete_sync() __try_to_del_timer_sync() // looped as long as timer is running <deadlock> In our case, we had in v4: inet_twsk_deschedule_put() spin_lock(ehash_lock); tw_timer_handler() inet_twsk_kill() spin_lock(ehash_lock); __inet_twsk_kill(); timer_shutdown_sync(&tw->tw_timer); The fix here is to move the timer deletion to a non-timer context. Workqueues fit the bill, and as the tw_timer_handler() would just queue a work item, I converted it to a delayed_work.
Does this explanation make sense? This is the reasoning that drove me to involve workqueues. I'm open to suggestions on alternative approaches.