Re: [patch 15/15] mm: add strictlimit knob
From: Fengguang Wu <hidden>
Date: 2017-12-07 10:15:47
Also in:
linux-fsdevel
On Thu, Dec 07, 2017 at 09:50:23AM +0100, Miklos Szeredi wrote:
On Thu, Dec 7, 2017 at 5:14 AM, Fengguang Wu [off-list ref] wrote:quoted
CC fuse maintainer, too. On Wed, Dec 06, 2017 at 05:09:27PM -0800, Andrew Morton wrote:quoted
On Fri, 1 Dec 2017 13:29:28 +0100 Jan Kara [off-list ref] wrote:quoted
On Thu 30-11-17 14:15:58, Andrew Morton wrote:quoted
From: Maxim Patlasov <redacted> Subject: mm: add strictlimit knob The "strictlimit" feature was introduced to enforce per-bdi dirty limits for FUSE which sets bdi max_ratio to 1% by default: http://article.gmane.org/gmane.linux.kernel.mm/105809 However the feature can be useful for other relatively slow or untrusted BDIs like USB flash drives and DVD+RW. The patch adds a knob to enable the feature: echo 1 > /sys/class/bdi/X:Y/strictlimit Being enabled, the feature enforces bdi max_ratio limit even if global (10%) dirty limit is not reached. Of course, the effect is not visible until /sys/class/bdi/X:Y/max_ratio is decreased to some reasonable value.In principle I have nothing against this and the usecase sounds reasonable (in fact I believe the lack of a feature like this is one of reasons why desktop automounters usually mount USB devices with 'sync' mount option). So feel free to add: Reviewed-by: Jan Kara <jack@suse.cz>Cc Jens, who may be vaguely interested in plans to finally merge this three-year-old patch? From: Maxim Patlasov <redacted> Subject: mm: add strictlimit knob The "strictlimit" feature was introduced to enforce per-bdi dirty limits for FUSE which sets bdi max_ratio to 1% by default: http://article.gmane.org/gmane.linux.kernel.mm/105809That link is invalid for now, possibly due to the gmane site rebuild. I find an email thread here which looks relevant: https://sourceforge.net/p/fuse/mailman/message/35254883/ Where Maxim has an interesting point: > Did any one try increasing the limit and did see any better/worsequoted
performance ?We've used 20% as default value in OpenVZ kernel for a long while (1% was not enough to saturate our distributed parallel storage). So the knob will also enable people to _disable_ the 1% fuse limit to increase performance. So people can use the exposed knob in 2 ways to fit their needs, which is in general a good thing. However the comment in wb_position_ratio() says Without strictlimit feature, fuse writeback may * consume arbitrary amount of RAM because it is accounted in * NR_WRITEBACK_TEMP which is not involved in calculating "nr_dirty". How dangerous would that be if some user disabled the 1% fuse limit through the exposed knob? Will the NR_WRITEBACK_TEMP effect go far beyond the user's expectation (20% max dirty limit)? Looking at the fuse code, NR_WRITEBACK_TEMP will grow proportional to WB_WRITEBACK, which should be throttled when bdi_write_congested(). The congested flag will be set on fuse_conn.num_background >= fuse_conn.congestion_threshold So it looks NR_WRITEBACK_TEMP will somehow be throttled. Just that it's not included in the 20% dirty limit.Only balance_dirty_pages_ratelimited() is going to limit the generation of dirty pages, I don't think congestion flags will do that.
Right. However my concern is something to limit the generation of
fuse's _writeback_ pages.
The normal writeback pages are limited in 2 ways:
- balance_dirty_pages_ratelimited()'s dirty throttling:
nr_dirty + nr_writeback + nr_unstable < global and/or bdi dirty limit
- block layer's nr_requests queue limit
However fuse's NR_WRITEBACK_TEMP looks special and has none of such
limits. The congested bit merely affect the vmscan pageout path.
pageout
may_write_to_inode
inode_write_congested
wb_congested
I wonder if fuse has its own approach to limit NR_WRITEBACK_TEMP?
Either explicitly or implicitly, there has to be some hard limit.
And (AFAICS) for fuse only BDI_CAP_STRICTLIMIT will allow accounting temp writeback pages when throttling dirty page generation. So without BDI_CAP_STRICTLIMIT kernel memory use of fuse may explode. So we probably need a way to force BDI_CAP_STRICTLIMIT (i.e. do not permit disabling it for fuse).
So fuse relies on small nr_dirty. Does fuse impose any explicit or implicit rule that NR_WRITEBACK_TEMP will never exceed (N * nr_dirty)? Otherwise the size of NR_WRITEBACK_TEMP cannot be guaranteed. For example, is it possible for some process (eg. dd) to dirty pages as fast as possible while some other kernel logic to convert PG_dirty to NR_WRITEBACK_TEMP as fast as possible, so that even the 1% bdi strictlimit (which limits PG_dirty rather than NR_WRITEBACK_TEMP) cannot stop all memory being eat up by ever growing NR_WRITEBACK_TEMP? Thanks, Fengguang -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>