Re: 2.5.59-mm5

From: Nick Piggin <hidden>
Date: 2003-01-24 15:55:40
Also in: linux-mm

Oliver Xymoron wrote:

On Fri, Jan 24, 2003 at 03:50:17AM -0800, Andrew Morton wrote:

quoted

Alex Tomas [off-list ref] wrote:

quoted

quoted
quoted
quoted
quoted
Andrew Morton (AM) writes:

AM> But writes are completely different.  There is no dependency
AM> between them and at any point in time we know where on-disk a lot
AM> of writes will be placed.  We don't know that for reads, which is
AM> why we need to twiddle thumbs until the application or filesystem
AM> makes up its mind.


it's significant that application doesn't want to wait read completion
long and doesn't wait for write completion in most cases.

That's correct.  Reads are usually synchronous and writes are rarely
synchronous.

The most common place where the kernel forces a user process to wait on
completion of a write is actually in unlink (truncate, really).  Because
truncate must wait for in-progress I/O to complete before allowing the
filesystem to free (and potentially reuse) the affected blocks.

If there's a lot of writeout happening then truncate can take _ages_.  Hence
this patch:

An alternate approach might be to change the way the scheduler splits
things. That is, rather than marking I/O read vs write and scheduling
based on that, add a flag bit to mark them all sync vs async since
that's the distinction we actually care about. The normal paths can
all do read+sync and write+async, but you can now do things like
marking your truncate writes sync and readahead async.

And dependent/nondependent or stalling/nonstalling might be a clearer
terminology.

That will be worth investigating to see if the complexity is worth it.
I think from a disk point of view, we still want to split batches between
reads and writes. Could be wrong.

Nick

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help