Re: [PATCH] writeback: Per-block device bdi->dirty_writeback_interval and bdi->dirty_expire_interval.
From: Wu Fengguang <hidden>
Date: 2011-08-19 14:24:33
Also in:
linux-fsdevel, lkml
Hi Kautuk, On Fri, Aug 19, 2011 at 03:00:30PM +0800, Kautuk Consul wrote:
Hi Wu, Yes. I think I do understand your approach. Your aim is to always retain the per BDI timeout value. You want to check for threshholds by mathematically adjusting the background time too into your over_bground_thresh() formula so that your understanding holds true always and also affects the page dirtying scenario I mentioned. This definitely helps and refines this scenario in terms of flushing out of the dirty pages.
Thanks.
Doubts:
i) Your entire implementation seems to be dependent on someone
calling balance_dirty_pages()
directly or indirectly. This function will call the
bdi_start_background_writeback() which wakes
up the flusher thread.
What about those page dirtying code paths which might not call
balance_dirty_pages ?
Those paths then depend on the BDI thread periodically writing it
to disk and then we are again
dependent on the writeback interval.
Can we assume that the kernel will reliably call
balance_dirty_pages() whenever the pages
are dirtied ? If that was true, then we would not need bdi
periodic writeback threads ever.Yes. The kernel need a way to limit the total number of dirty pages at any given time and to keep them under dirty_ratio/dirty_bytes. balance_dirty_pages() is such a central place to throttle the dirty pages. Whatever code path generating dirty pages are required to call into balance_dirty_pages_ratelimited_nr() which will in turn call balance_dirty_pages(). So, the values specified by dirty_ratio/dirty_bytes will be executed effectively by balance_dirty_pages(). In contrast, the values specified by dirty_expire_centisecs is merely a parameter used by wb_writeback() to select the eligible inodes to do writeout. The 30s dirty expire time is never a guarantee that all inodes/pages dirtied before 30s will be timely written to disk. It's better interpreted in the opposite way: when under the dirty_background_ratio threshold and hence background writeout does not kick in, dirty inodes younger than 30s won't be written to disk by the flusher.
ii) Even after your rigorous checking, the bdi_writeback_thread()
will still do a schedule_timeout()
with the global value. Will your current solution then handle
Artem's disk removal scenario ?
Else, you start using your value in the schedule_timeout() call
in the bdi_writeback_thread()
function, which brings us back to the interval phenomenon I was
talking about.
wb_writeback() will keep running as long as over_bground_thresh().
The flusher will keep writing as long as there are more works, since
there is a
if (!list_empty(&bdi->work_list))
continue;
before the schedule_timeout() call.
And the flusher thread will always be woke up timely from
balance_dirty_pages().
So schedule_timeout() won't block in the way at all.
Does this patch really help the user control exact time when the write BIO is transferred from the MM to the Block layer assuming balance_dirty_pages() is not called ?
It would be a serious bug if balance_dirty_pages() is somehow not called. But note that balance_dirty_pages() is designed to be called on every N pages to reduce overheads. Thanks, Fengguang -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>