Thread (21 messages) 21 messages, 3 authors, 2011-08-21

Re: [PATCH] writeback: Per-block device bdi->dirty_writeback_interval and bdi->dirty_expire_interval.

From: Wu Fengguang <hidden>
Date: 2011-08-19 05:28:39
Also in: linux-fsdevel, lkml

On Fri, Aug 19, 2011 at 12:38:36PM +0800, Kautuk Consul wrote:
HI Wu,

On Fri, Aug 19, 2011 at 8:04 AM, Wu Fengguang [off-list ref] wrote:
quoted
Hi Kautuk,

On Fri, Aug 19, 2011 at 12:25:58AM +0800, Kautuk Consul wrote:
quoted
Lines: 59

Hi Wu,

On Thu, Aug 18, 2011 at 6:43 PM, Wu Fengguang [off-list ref] wrote:
quoted
Hi Artem,
quoted
Here is a real use-case we had when developing the N900 phone. We had
internal flash and external microSD slot. Internal flash is soldered in
and cannot be removed by the user. MicroSD, in contrast, can be removed
by the user.

For the internal flash we wanted long intervals and relaxed limits to
gain better performance.

For MicroSD we wanted very short intervals and tough limits to make sure
that if the user suddenly removes his microSD (users do this all the
time) - we do not lose data.
Thinking twice about it, I find that the different requirements for
interval flash/external microSD can also be solved by this scheme.

Introduce a per-bdi dirty_background_time (and optionally dirty_time)
as the counterpart of (and works in parallel to) global dirty[_background]_ratio,
however with unit "milliseconds worth of data".

The per-bdi dirty_background_time will be set low for external microSD
and high for internal flash. Then you get timely writeouts for microSD
and reasonably delayed writes for internal flash (controllable by the
global dirty_expire_centisecs).

The dirty_background_time will actually work more reliable than
dirty_expire_centisecs because it will checked immediately after the
application dirties more pages. And the dirty_time could provide
strong data integrity guarantee -- much stronger than
dirty_expire_centisecs -- if used.
The dirty_writeback_centisecs is the value we are also actually
interested in, and not just
dirty_expire_interval. This value is what is actually used to reset
the per-BDI timeout in the code.
Yes. I assumed if one reduced dirty_expire_centisecs, he may well want
to reduce dirty_writeback_centisecs.
quoted
quoted
quoted
Does that sound reasonable?

Thanks,
Fengguang
My understanding of your email appears that you are agreeing in
principle that the temporal
aspect of this problem needs to be addressed along with your spatial
pattern analysis technique.
Yup.
quoted
I feel a more generic solution to the problem is required because the
problem faced by Artem can appear
in a different situation for a different application.

I can re-implement my original patch in either centiseconds or
milliseconds as suggested by you.
My concern on your patch is the possible conflicts and confusions
between the global and the per-bdi dirty_expire_centisecs. To maintain
compatibility you need to keep the global one. Then there is the hard
If you refer to my original email, I have addressed this as follows:
When the global value is set, then all the per-BDI dirty*_centisecs
are also reset
to the global value.
This is essential for retaining the functionality across Linux
distributions using
the global values.
This amounts to compatibility as the global values will take effect.
After that point, if the user/admin feels, he/she can adjust/tune the
per-BDI counters to
certain empirical value as per the specific application. This will not
alter the global values.
Such "resetting all" behavior could be disgusting. Some users without
the global view may be puzzled why their set value is lost.

A better scheme would be to use the bdi value if it's non-zero, and
fall back to the global value otherwise. This will reduce complexity
of the code as well as interface.
quoted
Given that we'll need to introduce the dirty_background_time interface
anyway, and it happen to can address the N900 internal/removable storage
problem (mostly), I'm more than glad to cancel the dirty_expire_centisecs
problem.
I have following doubts with respect to your dirty_background_time
interface suggestion:
i)   You say that you'll do this only for the N900 problem for solving
the unexpected disk removal
      problem.
      I believe you are ignoring the problem of rate of undirtying of
the block device pages for
      making reclamation of that block device's file-cache pages at a
sooner possible future time.
      I mentioned this in my earlier emails also.
I care the dirty page reclaim problem a lot, however this patch is
fundamentally not the right answer to that problem.
ii)   Will you be changing the dirty_background_time dynamically with
your algorithm ?
      According to your description, I think not.
dirty_background_time will be some static value.
iii)  I cannot see how your implementation of dirty_background_time is
different from mine, except
      maybe for the first time interval taking effect properly.
dirty_background_time will be the analog to dirty_background_ratio.

dirty_background_ratio/dirty_ratio and
dirty_writeback_centisecs/dirty_expire_centisecs is as different as
apple and orange.
      However, we can also think that the first time interval should
probably be honoured with the older
      value to make the transition from the old timer value to new
timer value smoother in terms of
      periodic writeback functionality.
There is no "interval" thing for dirty_background_time.
(I'll show you the implementation tomorrow.)
quoted
Or, do you have better way out of the dirty_expire_centisecs dilemma?
Maybe we can delete the global value entirely. However as you
correctly mentioned above, this
will impact other tools distributions altering these global values.
Right. Deleting existing interfaces are NOT an option.
You mentioned the close relationship between the dirty_background_time
and the global dirty
[_background]_ratio.
Do you mean to say that you intend to change the dirty_background_time
based on changes to
the dirty_background_ratio ?
Since the global dirty_background_ratio doesn't result in changes to
the dirty_writeback_centisecs
wouldn't this amount to a radical change in the existing relationship
of these configurable values ?
dirty_background_time will be complementing dirty_background_ratio.
One will be adaptive to device bandwidth, the other to memory size.

The users typically don't want to accumulate many dirty pages to eat
up the memory, or take too much time to writeout.

So it's very natural to introduce dirty_background_time to fill the gap.

Thanks,
Fengguang

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help