Thread (28 messages) 28 messages, 5 authors, 2021-06-10

Re: [PATCH 03/14] mm: Protect operations adding pages to page cache with invalidate_lock

From: Jan Kara <jack@suse.cz>
Date: 2021-06-08 12:19:18
Also in: ceph-devel, linux-ext4, linux-f2fs-devel, linux-fsdevel, linux-mm, linux-xfs

On Mon 07-06-21 09:09:22, Darrick J. Wong wrote:
On Mon, Jun 07, 2021 at 04:52:13PM +0200, Jan Kara wrote:
quoted
Currently, serializing operations such as page fault, read, or readahead
against hole punching is rather difficult. The basic race scheme is
like:

fallocate(FALLOC_FL_PUNCH_HOLE)			read / fault / ..
  truncate_inode_pages_range()
						  <create pages in page
						   cache here>
  <update fs block mapping and free blocks>

Now the problem is in this way read / page fault / readahead can
instantiate pages in page cache with potentially stale data (if blocks
get quickly reused). Avoiding this race is not simple - page locks do
not work because we want to make sure there are *no* pages in given
range. inode->i_rwsem does not work because page fault happens under
mmap_sem which ranks below inode->i_rwsem. Also using it for reads makes
the performance for mixed read-write workloads suffer.

So create a new rw_semaphore in the address_space - invalidate_lock -
that protects adding of pages to page cache for page faults / reads /
readahead.

Signed-off-by: Jan Kara <jack@suse.cz>
...
quoted
+->fallocate implementation must be really careful to maintain page cache
+consistency when punching holes or performing other operations that invalidate
+page cache contents. Usually the filesystem needs to call
+truncate_inode_pages_range() to invalidate relevant range of the page cache.
+However the filesystem usually also needs to update its internal (and on disk)
+view of file offset -> disk block mapping. Until this update is finished, the
+filesystem needs to block page faults and reads from reloading now-stale page
+cache contents from the disk. VFS provides mapping->invalidate_lock for this
+and acquires it in shared mode in paths loading pages from disk
+(filemap_fault(), filemap_read(), readahead paths). The filesystem is
+responsible for taking this lock in its fallocate implementation and generally
+whenever the page cache contents needs to be invalidated because a block is
+moving from under a page.
Having a page cache invalidation lock isn't optional anymore, so I think
these last two sentences could be condensed:

"...from reloading now-stale page cache contents from disk.  Since VFS
acquires mapping->invalidate_lock in shared mode when loading pages from
disk (filemap_fault(), filemap_read(), readahead), the fallocate
implementation must take the invalidate_lock to prevent reloading."
quoted
+
+->copy_file_range and ->remap_file_range implementations need to serialize
+against modifications of file data while the operation is running. For
+blocking changes through write(2) and similar operations inode->i_rwsem can be
+used. For blocking changes through memory mapping, the filesystem can use
+mapping->invalidate_lock provided it also acquires it in its ->page_mkwrite
+implementation.
Following the same line of reasoning, if taking the invalidate_lock is
no longer optional, then the conditional language in this last sentence
is incorrect.  How about:

"To block changes to file contents via a memory mapping during the
operation, the filesystem must take mapping->invalidate_lock to
coordinate with ->page_mkwrite."

The code changes look fine to me, though I'm no mm expert. ;)
OK, I've updated the documentation as you suggested. Thanks for review.

									Honza
-- 
Jan Kara [off-list ref]
SUSE Labs, CR
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help