Thread (21 messages) 21 messages, 3 authors, 2011-08-21

Re: [PATCH] writeback: Per-block device bdi->dirty_writeback_interval and bdi->dirty_expire_interval.

From: Wu Fengguang <hidden>
Date: 2011-08-19 14:24:33
Also in: linux-fsdevel, lkml

Hi Kautuk,

On Fri, Aug 19, 2011 at 03:00:30PM +0800, Kautuk Consul wrote:
Hi Wu,

Yes. I think I do understand your approach.

Your aim is to always retain the per BDI timeout value.

You want to check for threshholds by mathematically adjusting the
background time too
into your over_bground_thresh() formula so that your understanding
holds true always and also
affects the page dirtying scenario I mentioned.
This definitely helps and refines this scenario in terms of flushing
out of the dirty pages.
Thanks.
Doubts:
i)   Your entire implementation seems to be dependent on someone
calling balance_dirty_pages()
     directly or indirectly. This function will call the
bdi_start_background_writeback() which wakes
     up the flusher thread.
     What about those page dirtying code paths which might not call
balance_dirty_pages ?
     Those paths then depend on the BDI thread periodically writing it
to disk and then we are again
     dependent on the writeback interval.
     Can we assume that the kernel will reliably call
balance_dirty_pages() whenever the pages
     are dirtied ? If that was true, then we would not need bdi
periodic writeback threads ever.
Yes. The kernel need a way to limit the total number of dirty pages at
any given time and to keep them under dirty_ratio/dirty_bytes.

balance_dirty_pages() is such a central place to throttle the dirty
pages. Whatever code path generating dirty pages are required to call
into balance_dirty_pages_ratelimited_nr() which will in turn call
balance_dirty_pages().

So, the values specified by dirty_ratio/dirty_bytes will be executed
effectively by balance_dirty_pages(). In contrast, the values
specified by dirty_expire_centisecs is merely a parameter used by
wb_writeback() to select the eligible inodes to do writeout. The 30s
dirty expire time is never a guarantee that all inodes/pages dirtied
before 30s will be timely written to disk. It's better interpreted in
the opposite way: when under the dirty_background_ratio threshold and
hence background writeout does not kick in, dirty inodes younger than
30s won't be written to disk by the flusher.
ii)  Even after your rigorous checking, the bdi_writeback_thread()
will still do a schedule_timeout()
     with the global value. Will your current solution then handle
Artem's disk removal scenario ?
     Else, you start using your value in the schedule_timeout() call
in the bdi_writeback_thread()
     function, which brings us back to the interval phenomenon I was
talking about.
wb_writeback() will keep running as long as over_bground_thresh().

The flusher will keep writing as long as there are more works, since
there is a

                if (!list_empty(&bdi->work_list))
                        continue;

before the schedule_timeout() call.

And the flusher thread will always be woke up timely from
balance_dirty_pages().

So schedule_timeout() won't block in the way at all.
Does this patch really help the user control exact time when the write
BIO is transferred from the
MM to the Block layer assuming balance_dirty_pages() is not called ?
It would be a serious bug if balance_dirty_pages() is somehow not
called. But note that balance_dirty_pages() is designed to be called
on every N pages to reduce overheads.

Thanks,
Fengguang

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help