Re: [PATCHSET v5] Make background writeback great again for the first time

From: Jan Kara <jack@suse.cz>
Date: 2016-05-03 13:06:12
Also in: linux-fsdevel, lkml

On Tue 03-05-16 08:40:11, Chris Mason wrote:

On Tue, May 03, 2016 at 02:17:19PM +0200, Jan Kara wrote:

quoted

On Thu 28-04-16 12:46:41, Jens Axboe wrote:

quoted

-	rwb->wb_max = 1 + ((depth - 1) >> min(31U, rwb->scale_step));
-	rwb->wb_normal = (rwb->wb_max + 1) / 2;
-	rwb->wb_background = (rwb->wb_max + 3) / 4;
+	if (rwb->queue_depth == 1) {
+		rwb->wb_max = rwb->wb_normal = 2;
+		rwb->wb_background = 1;

This breaks the detection of too big scale_step in scale_up() where we key
of wb_max == 1 value. However even with that fixed no luck :(:

Yeah, I need to look at that. For QD=1, I think the only sensible values for
max/normal/bg is 2/2/1 and 1/1/1 if we step down.

quoted

dd if=/dev/zero of=/mnt/file bs=1M count=10000 conv=fsync
Runtime: 105.126 107.125 105.641

So about the same as before. I'll try to debug this later today...

Thanks, I'm very interested in what you find!

OK, so the reason was relatively standard in the end. I was using ext3 (or
more exactly ext4 without delayed allocation) for the test. The throttling
of background writes gave more priority to writes from the journalling
thread which happen with WRITE_SYNC and thus are not throttled. Thus the
journalling thread ended up having to do more data writeback to be able to
commit a transaction (due to requirements of data=ordered mode) and it is
less efficient at that than the normal flusher thread.

So this is an example where throttling background writeback effectively
just pushes more work into another context which does it less efficiently
and indirectly makes everyone wait for it. ext3 has been always sensitive to
issues like this. ext4 is using delayed allocation and thus only data
writes into holes end up being part of a transaction -> simple dd test case
doesn't hit that path. And indeed when I repeat the same test with ext4,
the numbers with and without your patch are exactly the same.

The question remains how common a pattern where throttling of background
writeback delays also something else is. I'll schedule a couple of
benchmarks to measure impact of your patches for a wider range of workloads
(but sadly pretty limited set of hw). If ext3 is the only one seeing
issues, I would be willing to accept that ext3 takes the hit since it is
doing something rather stupid (but inherent in its journal design) and we
have a way to deal with this either by enabling delayed allocation or by
turning off the writeback throttling...

At least in the case of io that we know is going to be data=ordered, we
can bump the prio of those pages?

But how would flusher thread, which is submitting IO, know that? We would
have to somehow mark inodes that are part of the running transaction and
flusher thread could give more priority to such writeback - e.g. by using
WRITE_SYNC or at least plain writes. Hmm, if we use an inode flag for that,
it could be doable.

								Honza
-- 
Jan Kara [off-list ref]
SUSE Labs, CR

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help