Thread (14 messages) 14 messages, 4 authors, 2020-11-25

Re: netconsole deadlock with virtnet

From: Jakub Kicinski <kuba@kernel.org>
Date: 2020-11-23 19:21:37

On Mon, 23 Nov 2020 14:09:34 -0500 Steven Rostedt wrote:
On Mon, 23 Nov 2020 10:52:52 -0800
Jakub Kicinski [off-list ref] wrote:
quoted
On Mon, 23 Nov 2020 09:31:28 -0500 Steven Rostedt wrote:  
quoted
On Mon, 23 Nov 2020 13:08:55 +0200
Leon Romanovsky [off-list ref] wrote:

    
quoted
 [   10.028024] Chain exists of:
 [   10.028025]   console_owner --> target_list_lock --> _xmit_ETHER#2      
Note, the problem is that we have a location that grabs the xmit_lock while
holding target_list_lock (and possibly console_owner).    
Well, it try_locks the xmit_lock. Does lockdep understand try-locks?

(not that I condone the shenanigans that are going on here)  
Does it?

	virtnet_poll_tx() {
		__netif_tx_lock() {
			spin_lock(&txq->_xmit_lock);
Umpf. Right. I was looking at virtnet_poll_cleantx()
That looks like we can have:


	CPU0		CPU1
	----		----
   lock(xmit_lock)

		    lock(console)
		    lock(target_list_lock)
		    __netif_tx_lock()
		        lock(xmit_lock);

			[BLOCKED]

   <interrupt>
   lock(console)

   [BLOCKED]



 DEADLOCK.


So where is the trylock here?

Perhaps you need the trylock in virtnet_poll_tx()?
That could work. Best if we used normal lock if !!budget, and trylock
when budget is 0. But maybe that's too hairy.

I'm assuming all this trickiness comes from virtqueue_get_buf() needing
locking vs the TX path? It's pretty unusual for the completion path to
need locking vs xmit path.
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help