Re: page fault scalability (ext3, ext4, xfs)
From: Andy Lutomirski <luto@amacapital.net>
Date: 2013-08-19 22:29:21
Also in:
linux-fsdevel, lkml
On Mon, Aug 19, 2013 at 3:17 PM, J. Bruce Fields [off-list ref] wrote:
On Thu, Aug 15, 2013 at 04:01:49PM +1000, Dave Chinner wrote:quoted
On Wed, Aug 14, 2013 at 09:32:13PM -0700, Andy Lutomirski wrote:quoted
On Wed, Aug 14, 2013 at 7:10 PM, Dave Chinner [off-list ref] wrote:quoted
On Wed, Aug 14, 2013 at 09:11:01PM -0400, Theodore Ts'o wrote:quoted
On Wed, Aug 14, 2013 at 04:38:12PM -0700, Andy Lutomirski wrote:quoted
quoted
It would be better to write zeros to it, so we aren't measuring the cost of the unwritten->written conversion.At the risk of beating a dead horse, how hard would it be to defer this part until writeback?Part of the work has to be done at write time because we need to update allocation statistics (i.e., so that we don't have ENOSPC problems). The unwritten->written conversion does happen at writeback (as does the actual block allocation if we are doing delayed allocation). The point is that if the goal is to measure page fault scalability, we shouldn't have this other stuff happening as the same time as the page fault workload.Sure, but the real problem is not the block mapping or allocation path - even if the test is changed to take that out of the picture, we still have timestamp updates being done on every single page fault. ext4, XFS and btrfs all do transactional timestamp updates and have nanosecond granularity, so every page fault is resulting in a transaction to update the timestamp of the file being modified.I have (unmergeable) patches to fix this: http://comments.gmane.org/gmane.linux.kernel.mm/92476The big problem with this approach is that not doing the timestamp update on page faults is going to break the inode change version counting because for ext4, btrfs and XFS it takes a transaction to bump that counter. NFS needs to know the moment a file is changed in memory, not when it is written to disk.I don't think the in-memory updates of the data and the version have to be completely atomic, if that's what you mean.quoted
Also, NFS requires the change to the counter to be persistent over server failures, so it needs to be changed as part of a transaction....I'm not sure those two updates have to be a single atomic transaction on disk, either.
I hope not, because they aren't currently in the same transaction, and putting them in the same transaction require starting a transaction on page fault and doing the equivalent of writepages when the same transaction is committed. With my changes [1], they still aren't, but putting them in the same transaction would probably be only a couple lines of code, and it would actually improve performance. (I won't write those couple lines of code because I don't know anything at all about jbd2.) [1] https://lkml.org/lkml/2013/8/16/510 --Andy _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs