Thread (13 messages) 13 messages, 6 authors, 2017-03-09

Re: [PATCH] ext4: don't BUG when truncating encrypted inodes on the orphan list

From: Andreas Dilger <hidden>
Date: 2017-03-09 19:21:30

On Mar 9, 2017, at 6:47 AM, Jan Kara [off-list ref] wrote:
On Sat 11-02-17 21:27:38, Ted Tso wrote:
quoted
On Sat, Feb 11, 2017 at 12:26:52AM -0700, Andreas Dilger wrote:
quoted
The reason truncated orphans are on the orphan list is because the
transaction that sets i_size may be restarted if the inode is larger
than can be truncated in a single transaction.  If the system crashes
before the truncate finishes then the truncate should be completed
so that old data is not returned if the file is truncated larger again.
Another way of fixing this is at the time when the file is truncated
to a larger size.  Of course the other case we need handle is what
happens if there is data after i_size and the file is mmaped.

One advantage of doing when the file is truncated larger again is at
that point we will have the encryption key.  In the case of an
encrypted file, both the kernel and e2fsck *can't* zero fill past
i_size if the key is not available.  And during the orphan replay the
encryption key won't be available.

The other way to solve the problem would be zero the portion of the
last remaining datablock *first* and journal the data block along with
the initial transaction which sets the i_size in the inode.  But that
gets tricky, since all data writes for that last block must not go to
the disk, and then once the journal has been committed we can't write
the block to via the normal page_io routines (since otherwise it might
get overwritten), until we write it back and then revoke the block in
the journal, and the revoke is committed.  Messy....
Going through some old email... I don't think this would be really
reasonably doable. What would fixup the missing zeroing on orphan cleanup
though is to zero the tail of the last page on readpage, extending
truncate and write beyond EOF. That may be acceptable cost for encrypted
inodes.
Another option would be to revive the unlink/truncate thread, and dump
the blocks to be truncated over to another (temporary) inode that is put
on the orphan list and will be unlinked.  That means the visible truncate
operation can always complete in a single transaction (including the
partial block write), and everything on the orphan list is essentially an
unlink rather than a truncate.

The code wasn't too complex, but we dropped it when extents arrived since
it didn't give a huge performance advantage.  That said, there could be a
benefit in terms of code simplification, since there wouldn't be the need
to restart transactions in the middle if the truncate gets too large.

The most recent version I could find is for ext3 in 2.4.29 at:

https://git.hpdd.intel.com/?p=fs/lustre-release.git;a=blob_plain;f=lustre/kernel_patches/patches/ext3-delete_thread-2.4.29.patch;hb=113303973ec9f8484eb2355a1a6ef3c4c7fd6a56

Cheers, Andreas




Attachments

Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help