Re: 2.5.59-mm5
From: Nick Piggin <hidden>
Date: 2003-01-24 15:55:40
Also in:
linux-mm
Oliver Xymoron wrote:
On Fri, Jan 24, 2003 at 03:50:17AM -0800, Andrew Morton wrote:quoted
Alex Tomas [off-list ref] wrote:quoted
quoted
quoted
quoted
quoted
quoted
Andrew Morton (AM) writes:AM> But writes are completely different. There is no dependency AM> between them and at any point in time we know where on-disk a lot AM> of writes will be placed. We don't know that for reads, which is AM> why we need to twiddle thumbs until the application or filesystem AM> makes up its mind. it's significant that application doesn't want to wait read completion long and doesn't wait for write completion in most cases.That's correct. Reads are usually synchronous and writes are rarely synchronous. The most common place where the kernel forces a user process to wait on completion of a write is actually in unlink (truncate, really). Because truncate must wait for in-progress I/O to complete before allowing the filesystem to free (and potentially reuse) the affected blocks. If there's a lot of writeout happening then truncate can take _ages_. Hence this patch:An alternate approach might be to change the way the scheduler splits things. That is, rather than marking I/O read vs write and scheduling based on that, add a flag bit to mark them all sync vs async since that's the distinction we actually care about. The normal paths can all do read+sync and write+async, but you can now do things like marking your truncate writes sync and readahead async. And dependent/nondependent or stalling/nonstalling might be a clearer terminology.
That will be worth investigating to see if the complexity is worth it. I think from a disk point of view, we still want to split batches between reads and writes. Could be wrong. Nick