Thread (18 messages) 18 messages, 3 authors, 2021-10-20

Re: [RFC v4 PATCH 0/6] Solve silent data loss caused by poisoned page cache (shmem/tmpfs)

From: Andrew Morton <akpm@linux-foundation.org>
Date: 2021-10-15 20:28:09
Also in: linux-fsdevel, lkml

On Thu, 14 Oct 2021 12:16:09 -0700 Yang Shi [off-list ref] wrote:
When discussing the patch that splits page cache THP in order to offline the
poisoned page, Noaya mentioned there is a bigger problem [1] that prevents this
from working since the page cache page will be truncated if uncorrectable
errors happen.  By looking this deeper it turns out this approach (truncating
poisoned page) may incur silent data loss for all non-readonly filesystems if
the page is dirty.  It may be worse for in-memory filesystem, e.g. shmem/tmpfs
since the data blocks are actually gone.

To solve this problem we could keep the poisoned dirty page in page cache then
notify the users on any later access, e.g. page fault, read/write, etc.  The
clean page could be truncated as is since they can be reread from disk later on.

The consequence is the filesystems may find poisoned page and manipulate it as
healthy page since all the filesystems actually don't check if the page is
poisoned or not in all the relevant paths except page fault.  In general, we
need make the filesystems be aware of poisoned page before we could keep the
poisoned page in page cache in order to solve the data loss problem.
Is the "RFC" still accurate, or might it be an accidental leftover?

I grabbed this series as-is for some testing, but I do think it wouild
be better if it was delivered as two separate series - one series for
the -stable material and one series for the 5.16-rc1 material.

Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help