Re: netconsole deadlock with virtnet

(off-list ancestor, not in this archive)
Re: netconsole deadlock with virtnet · Steven Rostedt <rostedt@goodmis.org> · 2020-11-18
Re: netconsole deadlock with virtnet · Leon Romanovsky <leon@kernel.org> · 2020-11-23
Re: netconsole deadlock with virtnet · Steven Rostedt <rostedt@goodmis.org> · 2020-11-23
Re: netconsole deadlock with virtnet · Jakub Kicinski <kuba@kernel.org> · 2020-11-23
Re: netconsole deadlock with virtnet · Steven Rostedt <rostedt@goodmis.org> · 2020-11-23
Re: netconsole deadlock with virtnet · Jakub Kicinski <kuba@kernel.org> · 2020-11-23
Re: netconsole deadlock with virtnet · Jason Wang <jasowang@redhat.com> · 2020-11-24
Re: netconsole deadlock with virtnet · Leon Romanovsky <leon@kernel.org> · 2020-11-24
Re: netconsole deadlock with virtnet · Jason Wang <jasowang@redhat.com> · 2020-11-24
Re: netconsole deadlock with virtnet · Leon Romanovsky <leon@kernel.org> · 2020-11-24
Re: netconsole deadlock with virtnet · Steven Rostedt <rostedt@goodmis.org> · 2020-11-24
Re: netconsole deadlock with virtnet · Jason Wang <jasowang@redhat.com> · 2020-11-25
Re: netconsole deadlock with virtnet · Jakub Kicinski <kuba@kernel.org> · 2020-11-24
Re: netconsole deadlock with virtnet · Jason Wang <jasowang@redhat.com> · 2020-11-25

From: Jakub Kicinski <kuba@kernel.org>
Date: 2020-11-23 19:21:37

On Mon, 23 Nov 2020 14:09:34 -0500 Steven Rostedt wrote:

On Mon, 23 Nov 2020 10:52:52 -0800
Jakub Kicinski [off-list ref] wrote:

quoted

On Mon, 23 Nov 2020 09:31:28 -0500 Steven Rostedt wrote:

quoted

On Mon, 23 Nov 2020 13:08:55 +0200
Leon Romanovsky [off-list ref] wrote:

quoted

 [   10.028024] Chain exists of:
 [   10.028025]   console_owner --> target_list_lock --> _xmit_ETHER#2

Note, the problem is that we have a location that grabs the xmit_lock while
holding target_list_lock (and possibly console_owner).

Well, it try_locks the xmit_lock. Does lockdep understand try-locks?

(not that I condone the shenanigans that are going on here)

Does it?

	virtnet_poll_tx() {
		__netif_tx_lock() {
			spin_lock(&txq->_xmit_lock);

Umpf. Right. I was looking at virtnet_poll_cleantx()

That looks like we can have:


	CPU0		CPU1
	----		----
   lock(xmit_lock)

		    lock(console)
		    lock(target_list_lock)
		    __netif_tx_lock()
		        lock(xmit_lock);

			[BLOCKED]

   <interrupt>
   lock(console)

   [BLOCKED]



 DEADLOCK.


So where is the trylock here?

Perhaps you need the trylock in virtnet_poll_tx()?

That could work. Best if we used normal lock if !!budget, and trylock
when budget is 0. But maybe that's too hairy.

I'm assuming all this trickiness comes from virtqueue_get_buf() needing
locking vs the TX path? It's pretty unusual for the completion path to
need locking vs xmit path.

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help