Thread (70 messages) 70 messages, 8 authors, 2021-05-28

Re: [PATCH v4 01/26] mm/mmu_notifiers: pass private data down to alloc_notifier()

From: Jean-Philippe Brucker <hidden>
Date: 2020-02-28 14:39:45
Also in: linux-devicetree, linux-iommu, linux-mm, linux-pci

On Tue, Feb 25, 2020 at 10:08:14AM -0400, Jason Gunthorpe wrote:
On Tue, Feb 25, 2020 at 10:24:39AM +0100, Jean-Philippe Brucker wrote:
quoted
On Mon, Feb 24, 2020 at 03:00:56PM -0400, Jason Gunthorpe wrote:
quoted
On Mon, Feb 24, 2020 at 07:23:36PM +0100, Jean-Philippe Brucker wrote:
quoted
The new allocation scheme introduced by 2c7933f53f6b ("mm/mmu_notifiers:
add a get/put scheme for the registration") provides a convenient way
for users to attach notifier data to an mm. However, it would be even
better to create this notifier data atomically.

Since the alloc_notifier() callback only takes an mm argument at the
moment, some users have to perform the allocation in two times.
alloc_notifier() initially creates an incomplete structure, which is
then finalized using more context once mmu_notifier_get() returns. This
second step requires carrying an initialization lock in the notifier
data and playing dirty tricks to order memory accesses against live
invalidation.
This was the intended pattern. Tthere shouldn't be an real issue as
there shouldn't be any data on which to invalidate, ie the later patch
does:

+       list_for_each_entry_rcu(bond, &io_mm->devices, mm_head)

And that list is empty post-allocation, so no 'dirty tricks' required.
Before introducing this patch I had the following code:

+	list_for_each_entry_rcu(bond, &io_mm->devices, mm_head) {
+		/*
+		 * To ensure that we observe the initialization of io_mm fields
+		 * by io_mm_finalize() before the registration of this bond to
+		 * the list by io_mm_attach(), introduce an address dependency
+		 * between bond and io_mm. It pairs with the smp_store_release()
+		 * from list_add_rcu().
+		 */
+		io_mm = rcu_dereference(bond->io_mm);
A rcu_dereference isn't need here, just a normal derference is fine.
bond->io_mm is annotated with __rcu (for iommu_sva_get_pasid_generic(),
which does bond->io_mm under rcu_read_lock())
quoted
+		io_mm->ops->invalidate(bond->sva.dev, io_mm->pasid, io_mm->ctx,
+				       start, end - start);
+	}

(1) io_mm_get() would obtain an empty io_mm from iommu_notifier_get().
(2) then io_mm_finalize() would initialize io_mm->ops, io_mm->ctx, etc.
(3) finally io_mm_attach() would add the bond to io_mm->devices.

Since the above code can run before (2) it needs to observe valid
io_mm->ctx, io_mm->ops initialized by (2) after obtaining the bond
initialized by (3). Which I believe requires the address dependency from
the rcu_dereference() above or some stronger barrier to pair with the
list_add_rcu().
The list_for_each_entry_rcu() is an acquire that already pairs with
the release in list_add_rcu(), all you need is a data dependency chain
starting on bond to be correct on ordering.

But this is super tricky :\
quoted
If io_mm->ctx and io_mm->ops are already valid before the
mmu notifier is published, then we don't need that stuff.
So, this trickyness with RCU is not a bad reason to introduce the priv
scheme, maybe explain it in the commit message?
Ok, I've added this to the commit message:

    The IOMMU SVA module, which attaches an mm to multiple devices,
    exemplifies this situation. In essence it does:

            mmu_notifier_get()
              alloc_notifier()
                 A = kzalloc()
              /* MMU notifier is published */
            A->ctx = ctx;                           // (1)
            device->A = A;
            list_add_rcu(device, A->devices);       // (2)

    The invalidate notifier, which may start running before A is fully
    initialized at (1), does the following:

            io_mm_invalidate(A)
              list_for_each_entry_rcu(device, A->devices)
                A = device->A;                      // (3)
                device->invalidate(A->ctx)

    To ensure that an invalidate() thread observes write (1) before (2), it
    needs the address dependency (3). The resulting code is subtle and
    difficult to understand. If instead we fully initialize object A before
    publishing the MMU notifier, we don't need the complexity added by (3).


I'll try to improve the wording before sending next version.

Thanks,
Jean

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help