Thread (62 messages) 62 messages, 8 authors, 2021-11-06

Re: [dm-devel] [PATCH 0/6] dax poison recovery with RWF_RECOVERY_DATA flag

From: Dan Williams <hidden>
Date: 2021-11-04 19:00:27
Also in: dm-devel, linux-fsdevel, lkml, nvdimm

On Thu, Nov 4, 2021 at 11:34 AM Jane Chu [off-list ref] wrote:
Thanks for the enlightening discussion here, it's so helpful!

Please allow me to recap what I've caught up so far -

1. recovery write at page boundary due to NP setting in poisoned
    page to prevent undesirable prefetching
2. single interface to perform 3 tasks:
      { clear-poison, update error-list, write }
    such as an API in pmem driver.
    For CPUs that support MOVEDIR64B, the 'clear-poison' and 'write'
    task can be combined (would need something different from the
    existing _copy_mcsafe though) and 'update error-list' follows
    closely behind;
    For CPUs that rely on firmware call to clear posion, the existing
    pmem_clear_poison() can be used, followed by the 'write' task.
3. if user isn't given RWF_RECOVERY_FLAG flag, then dax recovery
    would be automatic for a write if range is page aligned;
    otherwise, the write fails with EIO as usual.
    Also, user mustn't have punched out the poisoned page in which
    case poison repairing will be a lot more complicated.
4. desirable to fetch as much data as possible from a poisoned range.

If this understanding is in the right direction, then I'd like to
propose below changes to
   dax_direct_access(), dax_copy_to/from_iter(), pmem_copy_to/from_iter()
   and the dm layer copy_to/from_iter, dax_iomap_iter().

1. dax_iomap_iter() rely on dax_direct_access() to decide whether there
    is likely media error: if the API without DAX_F_RECOVERY returns
    -EIO, then switch to recovery-read/write code.  In recovery code,
    supply DAX_F_RECOVERY to dax_direct_access() in order to obtain
    'kaddr', and then call dax_copy_to/from_iter() with DAX_F_RECOVERY.
I like it. It allows for an atomic write+clear implementation on
capable platforms and coordinates with potentially unmapped pages. The
best of both worlds from the dax_clear_poison() proposal and my "take
a fault and do a slow-path copy".
2. the _copy_to/from_iter implementation would be largely the same
    as in my recent patch, but some changes in Christoph's
    'dax-devirtualize' maybe kept, such as DAX_F_VIRTUAL, obviously
    virtual devices don't have the ability to clear poison, so no need
    to complicate them.  And this also means that not every endpoint
    dax device has to provide dax_op.copy_to/from_iter, they may use the
    default.
Did I miss this series or are you talking about this one?
https://lore.kernel.org/all/20211018044054.1779424-1-hch@lst.de/ (local)
I'm not sure about nova and others, if they use different 'write' other
than via iomap, does that mean there will be need for a new set of
dax_op for their read/write?
No, they're out-of-tree they'll adjust to the same interface that xfs
and ext4 are using when/if they go upstream.
the 3-in-1 binding would always be
required though. Maybe that'll be an ongoing discussion?
Yeah, let's cross that bridge when we come to it.
Comments? Suggestions?
It sounds great to me!
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help