Thread (64 messages) 64 messages, 7 authors, 2022-03-10

Re: Report 2 in ext4 and journal based on v5.17-rc1

From: Byungchul Park <hidden>
Date: 2022-03-03 01:37:10
Also in: dri-devel, linux-ext4, linux-fsdevel, linux-ide, linux-mm, lkml

On Mon, Feb 28, 2022 at 04:25:04PM -0500, Theodore Ts'o wrote:
On Mon, Feb 28, 2022 at 11:14:44AM +0100, Jan Kara wrote:
quoted
quoted
case 1. Code with an actual circular dependency, but not deadlock.

   A circular dependency can be broken by a rescue wakeup source e.g.
   timeout. It's not a deadlock. If it's okay that the contexts
   participating in the circular dependency and others waiting for the
   events in the circle are stuck until it gets broken. Otherwise, say,
   if it's not meant, then it's anyway problematic.

   1-1. What if we judge this code is problematic?
   1-2. What if we judge this code is good?

I've been wondering if the kernel guys esp. Linus considers code with
any circular dependency is problematic or not, even if it won't lead to
a deadlock, say, case 1. Even though I designed Dept based on what I
believe is right, of course, I'm willing to change the design according
to the majority opinion.

However, I would never allow case 1 if I were the owner of the kernel
for better stability, even though the code works anyway okay for now.
Note, I used the example of the timeout as the most obvious way of
explaining that a deadlock is not possible.  There is also the much
more complex explanation which Jan was trying to give, which is what
leads to the circular dependency.  It can happen that when trying to
start a handle, if either (a) there is not enough space in the journal
for new handles, or (b) the current transaction is so large that if we
don't close the transaction and start a new hone, we will end up
running out of space in the future, and so in that case,
start_this_handle() will block starting any more handles, and then
wake up the commit thread.  The commit thread then waits for the
currently running threads to complete, before it allows new handles to
start, and then it will complete the commit.  In the case of (a) we
then need to do a journal checkpoint, which is more work to release
space in the journal, and only then, can we allow new handles to start.
Thank you for the full explanation of how journal things work.
The botom line is (a) it works, (b) there aren't significant delays,
and for DEPT to complain that this is somehow wrong and we need to
completely rearchitect perfectly working code because it doesn't
confirm to DEPT's idea of what is "correct" is not acceptable.
Thanks to you and Jan Kara, I realized it's not a real dependency in the
consumer and producer scenario but again *ONLY IF* there is a rescue
wakeup source. Dept should track the rescue wakeup source instead in the
case.

I won't ask you to rearchitect the working code. The code looks sane.

Thanks a lot.

Thanks,
Byungchul
quoted
We have a queue of work to do Q protected by lock L. Consumer process has
code like:

while (1) {
	lock L
	prepare_to_wait(work_queued);
	if (no work) {
		unlock L
		sleep
	} else {
		unlock L
		do work
		wake_up(work_done)
	}
}

AFAIU Dept will create dependency here that 'wakeup work_done' is after
'wait for work_queued'. Producer has code like:

while (1) {
	lock L
	prepare_to_wait(work_done)
	if (too much work queued) {
		unlock L
		sleep
	} else {
		queue work
		unlock L
		wake_up(work_queued)
	}
}

And Dept will create dependency here that 'wakeup work_queued' is after
'wait for work_done'. And thus we have a trivial cycle in the dependencies
despite the code being perfectly valid and safe.
Cheers,

							- Ted
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help