Re: Reflink (cow) copy of busy files
From: Gionatan Danti <hidden>
Date: 2018-02-26 21:23:51
Il 26-02-2018 18:26 Darrick J. Wong ha scritto:
The way reflink is supposed to work wrt consistency is: 1. lock out all new io/fallocate activity on both inodes (iolock/mmaplock) 2. wait for all directio to complete 3. fsync both files (write all the dirty pagecache to disk) 4. lock both inodes (ilock) 5. clone each extent atomically 6. unlock ilock 7. unlock iolock/mmaplock So at least in theory the cloned file will match whatever the host saw on disk and page cache at the time the reflink call was initiated. I say 'in theory' because there could be bugs.
Great! CoW will be a great addition for XFS when it will be considered stable.
Whatever dirty state is in the guest VM stays in that VM, which means that if you only cp --reflink on the host, the clone you get will reflect the virtual disk state as if you'd kill -9'd the VM, cloned the VM disk, and restarted the VM. Upon restart the log recovers whatever metadata made it out of the VM.
Sure, it is what I means for "crash-consistent".
However, if you tell the guest to freeze the fs before cloning (as Dave suggested earlier) the guest will flush all its state to the upper level (the host) and the host will push all that out to disk before cloning. The snapshot you create should be cleaner because you're effectively prepaying the recovery costs by flushing everything before taking the snapshot.
True, and this is "application-level consistency" (which requires a guest agent and possibly even an application-specific agent)
Also note that if the host goes down before returning from the syscall, the log will continue on with whichever extent was being cloned at the time in order to preserve metadata integrity, but the destination file will reflect a partial copy.
Thanks for pointing that, and for your extremely clear explanation! -- Danti Gionatan Supporto Tecnico Assyoma S.r.l. - www.assyoma.it email: g.danti@assyoma.it - info@assyoma.it GPG public key ID: FF5F32A8