Re: netconsole deadlock with virtnet
From: Jakub Kicinski <kuba@kernel.org>
Date: 2020-11-23 19:21:37
On Mon, 23 Nov 2020 14:09:34 -0500 Steven Rostedt wrote:
On Mon, 23 Nov 2020 10:52:52 -0800 Jakub Kicinski [off-list ref] wrote:quoted
On Mon, 23 Nov 2020 09:31:28 -0500 Steven Rostedt wrote:quoted
On Mon, 23 Nov 2020 13:08:55 +0200 Leon Romanovsky [off-list ref] wrote:quoted
[ 10.028024] Chain exists of: [ 10.028025] console_owner --> target_list_lock --> _xmit_ETHER#2Note, the problem is that we have a location that grabs the xmit_lock while holding target_list_lock (and possibly console_owner).Well, it try_locks the xmit_lock. Does lockdep understand try-locks? (not that I condone the shenanigans that are going on here)Does it? virtnet_poll_tx() { __netif_tx_lock() { spin_lock(&txq->_xmit_lock);
Umpf. Right. I was looking at virtnet_poll_cleantx()
That looks like we can have: CPU0 CPU1 ---- ---- lock(xmit_lock) lock(console) lock(target_list_lock) __netif_tx_lock() lock(xmit_lock); [BLOCKED] <interrupt> lock(console) [BLOCKED] DEADLOCK. So where is the trylock here? Perhaps you need the trylock in virtnet_poll_tx()?
That could work. Best if we used normal lock if !!budget, and trylock when budget is 0. But maybe that's too hairy. I'm assuming all this trickiness comes from virtqueue_get_buf() needing locking vs the TX path? It's pretty unusual for the completion path to need locking vs xmit path.