Thread (40 messages) 40 messages, 8 authors, 2011-08-31

Re: [PATCH 6/7] mm: vmscan: Throttle reclaim if encountering too many dirty pages under writeback

From: Wu Fengguang <hidden>
Date: 2011-08-18 14:02:15
Also in: linux-mm, lkml

On Tue, Aug 16, 2011 at 11:02:08PM +0800, Mel Gorman wrote:
On Tue, Aug 16, 2011 at 10:06:52PM +0800, Wu Fengguang wrote:
quoted
Mel,

I tend to agree with the whole patchset except for this one.

The worry comes from the fact that there are always the very possible
unevenly distribution of dirty pages throughout the LRU lists.
It is pages under writeback that determines if throttling is considered
not dirty pages. The distinction is important. I agree with you that if
it was dirty pages that throttling would be considered too regularly.
Ah right, sorry for the rushed conclusion!

btw, I guess the vmscan will now progress faster due to the reduced
->pageout() and implicitly blocks in get_request_wait() on congested
IO queue.
quoted
This
patch works on local information and may unnecessarily throttle page
reclaim when running into small spans of dirty pages.
It's also calling wait_iff_congested() not congestion_wait(). This
takes BDI congestion and zone congestion into account with this check.

       /*
         * If there is no congestion, or heavy congestion is not being
         * encountered in the current zone, yield if necessary instead
         * of sleeping on the congestion queue
         */
        if (atomic_read(&nr_bdi_congested[sync]) == 0 ||
                        !zone_is_reclaim_congested(zone)) {

So global information is being taken into account.
That's right.
quoted
One possible scheme of global throttling is to first tag the skipped
page with PG_reclaim (as you already do). And to throttle page reclaim
only when running into pages with both PG_dirty and PG_reclaim set,
It's PG_writeback that is looked at, not PG_dirty.
quoted
which means we have cycled through the _whole_ LRU list (which is the
global and adaptive feedback we want) and run into that dirty page for
the second time.
This potentially results in more scanning from kswapd before it starts
throttling which could consume a lot of CPU. If pages under writeback
are reaching the end of the LRU, it's already the case that kswapd is
scanning faster than pages can be cleaned. Even then, it only really
throttles if the zone or a BDI is congested.
Yeah, the first round may already eat a lot of CPU power..
Taking that into consideration, do you still think there is a big
advantage to having writeback pages take another lap around the LRU
that is justifies the expected increase in CPU usage?
Given that there are typically much fewer PG_writeback than PG_dirty
(except for btrfs which probably should be fixed), the current
throttle condition should be strong enough to avoid false positives.

I even start to worry on the opposite side -- it could be less
throttled than necessary when some LRU is full of dirty pages and
somehow the flusher failed to focus on those pages (hence there are no
enough PG_writeback to wait upon at all).

In this case it may help to do "wait on PG_dirty&PG_reclaim and/or
PG_writeback&PG_reclaim". But the most essential task is always to let
the flusher focus more on the pages, rather than the question of
to-sleep-or-not-to-sleep, which will either block the direct reclaim
tasks for arbitrary long time, or act even worse by busy burning the CPU
during the time.

Thanks,
Fengguang

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help