Re: [PATCH RFC 0/5] IO-less balance_dirty_pages() v2 (simple approach)

[PATCH RFC 0/5] IO-less balance_dirty_pages() v2 (simple approach) · Jan Kara <jack@suse.cz> · 2011-03-08
[PATCH 4/5] mm: Remove low limit from sync_writeback_pages() · Jan Kara <jack@suse.cz> · 2011-03-08
[PATCH 3/5] mm: Implement IO-less balance_dirty_pages() · Jan Kara <jack@suse.cz> · 2011-03-08
Re: [PATCH 3/5] mm: Implement IO-less balance_dirty_pages() · Vivek Goyal <vgoyal@redhat.com> · 2011-03-10
Re: [PATCH 3/5] mm: Implement IO-less balance_dirty_pages() · Jan Kara <jack@suse.cz> · 2011-03-14
Re: [PATCH 3/5] mm: Implement IO-less balance_dirty_pages() · Vivek Goyal <vgoyal@redhat.com> · 2011-03-15
Re: [PATCH 3/5] mm: Implement IO-less balance_dirty_pages() · Curt Wohlgemuth <hidden> · 2011-03-16
Re: [PATCH 3/5] mm: Implement IO-less balance_dirty_pages() · Curt Wohlgemuth <hidden> · 2011-03-16
Re: [PATCH 3/5] mm: Implement IO-less balance_dirty_pages() · Vivek Goyal <vgoyal@redhat.com> · 2011-03-16
Re: [PATCH 3/5] mm: Implement IO-less balance_dirty_pages() · Jan Kara <jack@suse.cz> · 2011-03-16
Re: [PATCH 3/5] mm: Implement IO-less balance_dirty_pages() · Vivek Goyal <vgoyal@redhat.com> · 2011-03-16
Re: [PATCH 3/5] mm: Implement IO-less balance_dirty_pages() · Jan Kara <jack@suse.cz> · 2011-03-16
Re: [PATCH 3/5] mm: Implement IO-less balance_dirty_pages() · Vivek Goyal <vgoyal@redhat.com> · 2011-03-16
[PATCH 1/5] writeback: account per-bdi accumulated written pages · Jan Kara <jack@suse.cz> · 2011-03-08
[PATCH 2/5] mm: Properly reflect task dirty limits in dirty_exceeded logic · Jan Kara <jack@suse.cz> · 2011-03-08
Re: [PATCH 2/5] mm: Properly reflect task dirty limits in dirty_exceeded logic · Vivek Goyal <vgoyal@redhat.com> · 2011-03-09
Re: [PATCH 2/5] mm: Properly reflect task dirty limits in dirty_exceeded logic · Jan Kara <jack@suse.cz> · 2011-03-14
Re: [PATCH 2/5] mm: Properly reflect task dirty limits in dirty_exceeded logic · Vivek Goyal <vgoyal@redhat.com> · 2011-03-15
[PATCH 5/5] mm: Autotune interval between distribution of page completions · Jan Kara <jack@suse.cz> · 2011-03-08
Re: [PATCH RFC 0/5] IO-less balance_dirty_pages() v2 (simple approach) · Curt Wohlgemuth <hidden> · 2011-03-17
Re: [PATCH RFC 0/5] IO-less balance_dirty_pages() v2 (simple approach) · Christoph Hellwig <hch@infradead.org> · 2011-03-17
Re: [PATCH RFC 0/5] IO-less balance_dirty_pages() v2 (simple approach) · Curt Wohlgemuth <hidden> · 2011-03-17
Re: [PATCH RFC 0/5] IO-less balance_dirty_pages() v2 (simple approach) · Christoph Hellwig <hch@infradead.org> · 2011-03-17
Re: [PATCH RFC 0/5] IO-less balance_dirty_pages() v2 (simple approach) · Jan Kara <jack@suse.cz> · 2011-03-17
Re: [PATCH RFC 0/5] IO-less balance_dirty_pages() v2 (simple approach) · Curt Wohlgemuth <hidden> · 2011-03-17
Re: [PATCH RFC 0/5] IO-less balance_dirty_pages() v2 (simple approach) · Vivek Goyal <vgoyal@redhat.com> · 2011-03-17
Re: [PATCH RFC 0/5] IO-less balance_dirty_pages() v2 (simple approach) · Wu Fengguang <hidden> · 2011-03-18
Re: [PATCH RFC 0/5] IO-less balance_dirty_pages() v2 (simple approach) · Jan Kara <jack@suse.cz> · 2011-03-22
Re: [PATCH RFC 0/5] IO-less balance_dirty_pages() v2 (simple approach) · Dave Chinner <david@fromorbit.com> · 2011-03-23
Re: [PATCH RFC 0/5] IO-less balance_dirty_pages() v2 (simple approach) · Wu Fengguang <hidden> · 2011-03-25
Re: [PATCH RFC 0/5] IO-less balance_dirty_pages() v2 (simple approach) · Wu Fengguang <hidden> · 2011-03-25
Re: [PATCH RFC 0/5] IO-less balance_dirty_pages() v2 (simple approach) · Jan Kara <jack@suse.cz> · 2011-03-25
Re: [PATCH RFC 0/5] IO-less balance_dirty_pages() v2 (simple approach) · Wu Fengguang <hidden> · 2011-03-28
Re: [PATCH RFC 0/5] IO-less balance_dirty_pages() v2 (simple approach) · Jan Kara <jack@suse.cz> · 2011-03-28
Re: [PATCH RFC 0/5] IO-less balance_dirty_pages() v2 (simple approach) · Wu Fengguang <hidden> · 2011-03-29
Re: [PATCH RFC 0/5] IO-less balance_dirty_pages() v2 (simple approach) · Dave Chinner <david@fromorbit.com> · 2011-03-29
Re: [PATCH RFC 0/5] IO-less balance_dirty_pages() v2 (simple approach) · Wu Fengguang <hidden> · 2011-03-29
Re: [PATCH RFC 0/5] IO-less balance_dirty_pages() v2 (simple approach) · Dave Chinner <david@fromorbit.com> · 2011-03-29
Re: [PATCH RFC 0/5] IO-less balance_dirty_pages() v2 (simple approach) · Wu Fengguang <hidden> · 2011-03-29
Re: [PATCH RFC 0/5] IO-less balance_dirty_pages() v2 (simple approach) · Wu Fengguang <hidden> · 2011-03-29

From: Wu Fengguang <hidden>
Date: 2011-03-29 02:41:20
Also in: linux-fsdevel

On Tue, Mar 29, 2011 at 10:14:58AM +0800, Dave Chinner wrote:

-printable
Content-Length: 2034
Lines: 51

On Mon, Mar 28, 2011 at 10:44:45AM +0800, Wu Fengguang wrote:

quoted

On Sat, Mar 26, 2011 at 07:05:44AM +0800, Jan Kara wrote:

quoted

And actually the NFS traces you pointed to originally seem to be different
problem, in fact not directly related to what balance_dirty_pages() does...
And with local filesystem the results seem to be reasonable (although there
are some longer sleeps in your JBOD measurements I don't understand yet).

Yeah the NFS case can be improved on the FS side (for now you may just
reuse my NFS patches and focus on other generic improvements).

The JBOD issue is also beyond my understanding.

Note that XFS will also see one big IO completion per 0.5-1 seconds,
when we are to increase the write chunk size from the current 4MB to
near the bdi's write bandwidth. As illustrated by this graph:

http://www.kernel.org/pub/linux/kernel/people/wfg/writeback/dirty-throttling-v6/4G/xfs-1dd-1M-8p-3927M-20%25-2.6.38-rc6-dt6+-2011-02-27-22-58/global_dirtied_written-500.png

Which is _bad_.

Increasing the writeback chunk size simply causes dirty queue
starvation issues when there are lots of dirty files and lots more
memory than there is writeback bandwidth. Think of a machine with
1TB of RAM (that's a 200GB dirty limit) and 1GB/s of disk
throughput. Thats 3 minutes worth of writeback and increasing the
chunk size to ~1s worth of throughput means that the 200th dirty
file won't get serviced for 3 minutes....

We used to have behaviour similar to this this (prior to 2.6.16, IIRC),
and it caused all sorts of problems where people were losing 10-15
minute old data when the system crashed because writeback didn't
process the dirty inode list fast enough in the presence of lots of
large files....

 
Yes it is a problem, and can be best solved by automatically lowering
bdi dirty limit to (bdi->write_bandwidth * dirty_expire_interval/100).
Then we reliably control the lost data size to < 30s by default.

A small writeback chunk size has no adverse impact on XFS as long as
the elevator does it's job of merging IOs (which in 99.9% of cases
it does) so I'm wondering what the reason for making this change
is.

It's explained in this changelog (is the XFS paragraph still valid?)

        https://patchwork.kernel.org/patch/605151/

The larger write chunk size generally helps ext4 and RAID setups.

Thanks,
Fengguang

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help