Thread (20 messages) 20 messages, 6 authors, 2012-10-31

Re: [RFC PATCH 1/2] bdi: Create a flag to indicate that a backing device needs stable page writes

From: NeilBrown <hidden>
Date: 2012-10-30 00:34:41
Also in: linux-fsdevel

On Tue, 30 Oct 2012 01:10:08 +0100 Jan Kara [off-list ref] wrote:
On Tue 30-10-12 10:48:37, NeilBrown wrote:
quoted
On Mon, 29 Oct 2012 19:30:51 +0100 Jan Kara [off-list ref] wrote:
quoted
On Mon 29-10-12 19:13:58, Jan Kara wrote:
quoted
On Fri 26-10-12 18:35:24, Darrick J. Wong wrote:
quoted
This creates BDI_CAP_STABLE_WRITES, which indicates that a device requires
stable page writes.  It also plumbs in a sysfs attribute so that admins can
check the device status.

Signed-off-by: Darrick J. Wong <redacted>
  I guess Jens Axboe [off-list ref] would be the best target for this
patch (so that he can merge it). The patch looks OK to me. You can add:
  Reviewed-by: Jan Kara [off-list ref]
  One more thing popped up in my mind: What about NFS, Ceph or md RAID5?
These could (at least theoretically) care about stable writes as well. I'm
not sure if they really started to use them but it would be good to at
least let them know.
What exactly are the semantics of BDI_CAP_STABLE_WRITES ?

If I set it for md/RAID5, do I get a cast-iron guarantee that no byte in any
page submitted for write will ever change until after I call bio_endio()?
  Yes.
quoted
If so, is this true for all filesystems? - I would expect a bigger patch would
be needed for that.
  Actually the code is in kernel for quite some time already. The problem
is it is always enabled causing unnecessary performance issues for some
workloads. So these patches try to be more selective in when the code gets
enabled.

Regarding "all filesystems" question: If we update filemap_page_mkwrite()
to call wait_on_page_writeback() then it should be for all filesystems.
Cool.  I didn't realise it had progressed that far.

I guess it is time to look at the possibility of removing the
'copy-into-cache' step for full-page, well-aligned bi_iovecs.

I assume this applies to swap-out as well ??  It has been a minor source of
frustration that when you swap-out to RAID1, you can occasionally get
different data on the two devices because memory changed between the two DMA
events.

NeilBrown

Attachments

Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help