Thread (5 messages) 5 messages, 3 authors, 2019-03-26

Re: [RFC PATCH] mm: readahead: add readahead_shift into backing device

From: Martin Liu <hidden>
Date: 2019-03-26 08:12:46
Also in: linux-mm, lkml

On Tue, Mar 26, 2019 at 09:30:58AM +0800, Fengguang Wu wrote:
On Mon, Mar 25, 2019 at 09:59:31AM -0700, Mark Salyzyn wrote:
quoted
On 03/25/2019 05:16 AM, Fengguang Wu wrote:
quoted
Martin,

On Fri, Mar 22, 2019 at 11:46:11PM +0800, Martin Liu wrote:
quoted
As the discussion https://lore.kernel.org/patchwork/patch/334982/
We know an open file's ra_pages might run out of sync from
bdi.ra_pages since sequential, random or error read. Current design
is we have to ask users to reopen the file or use fdavise system
call to get it sync. However, we might have some cases to change
system wide file ra_pages to enhance system performance such as
enhance the boot time by increasing the ra_pages or decrease it to
Do you have examples that some distro making use of larger ra_pages
for boot time optimization?
Android (if you are willing to squint and look at android-common AOSP
kernels as a Distro).
OK. I wonder how exactly Android makes use of it. Since phones are not
using hard disks, so should benefit less from large ra_pages.  Would
you kindly point me to the code?
Yes, one of the example is as below.
https://source.android.com/devices/tech/perf/boot-times#optimizing-i-o-
efficiency
quoted
quoted
Suppose N read streams with equal read speed. The thrash-free memory
requirement would be (N * 2 * ra_pages).

If N=1000 and ra_pages=1MB, it'd require 2GB memory. Which looks
affordable in mainstream servers.
That is 50% of the memory on a high end Android device ...
Yeah but I'm obviously not talking Android device here. Will a phone
serve 1000 concurrent read streams?
For Android, some important, persistent services and native HALs might
hold fd for a long time unless request a restart action and then would
impact overall user experience(guess more than 100). For some low end
devices which is a big portion of Android devices, their memory size
might be even smaller. Thus, when the device is under memory pressure,
this might bring more overhead to impact the performance. As current
design, we don't have a way to shrink readahead immediately. This
interface gives the flexibility to an adiminstrator to decide how
readahed to participate the mitigation level base on the metric it has.
quoted
quoted
Sorry but it sounds like introducing an unnecessarily twisted new
interface. I'm afraid it fixes the pain for 0.001% users while
bringing more puzzle to the majority others.
2B Android devices on the planet is 0.001%?
Nope. Sorry I didn't know about the Android usage.
Actually nobody mentioned it in the past discussions.
quoted
I am not defending the proposed interface though, if there is something
better that can be used, then looking into:
quoted
Then let fadvise() and shrink_readahead_size_eio() adjust that
per-file ra_pages_shift.
Sounds like this would require a lot from init to globally audit and
reduce the read-ahead for all open files?
It depends. In theory it should be possible to create a standalone
kernel module to dump the page cache and get the current snapshot of
all cached file pages. It'd be a one-shot action and don't require
continuous auditing.

[RFC] kernel facilities for cache prefetching
https://lwn.net/Articles/182128

This tool may also work. It's quick to get the list of opened files by
walking /proc/*/fd/, however not as easy to get the list of cached
file names.

https://github.com/tobert/pcstat

Perhaps we can do a simplified /proc/filecache that only dumps the
list of cached file names. Then let mincore() based tools take care
of the rest work.
Thanks for the information, they are very useful. For Android, it would
keep updating pretty frequently and the lists might need to be updated
as the end users install apps, runtime optimization or get new OTA.
Therefore, this might request pretty much effort to maintain this.
Please kindly correct me if any misunderstanding. Thanks.

Regards,
Martin
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help