Thread (40 messages) 40 messages, 7 authors, 2021-06-07

Re: [PATCH v9 07/10] mm: Device exclusive memory access

From: Alistair Popple <apopple@nvidia.com>
Date: 2021-05-28 01:48:52
Also in: dri-devel, linux-mm, lkml, nouveau

On Thursday, 27 May 2021 11:04:57 PM AEST Peter Xu wrote:
On Thu, May 27, 2021 at 01:35:39PM +1000, Alistair Popple wrote:
quoted
quoted
quoted
+ *
+ * @MMU_NOTIFY_EXCLUSIVE: to signal a device driver that the device
will
no + * longer have exclusive access to the page. May ignore the
invalidation that's + * part of make_device_exclusive_range() if the
owner field
+ * matches the value passed to make_device_exclusive_range().
Perhaps s/matches/does not match/?
No, "matches" is correct. The MMU_NOTIFY_EXCLUSIVE notifier is to notify a
listener that a range is being invalidated for the purpose of making the
range available for some device to have exclusive access to. Which does
also mean a device getting the notification no longer has exclusive
access if it already did.

A unique type is needed because when creating the range a driver needs to
form a mmu critical section (with mmu_interval_read_begin()/
mmu_interval_read_end()) to ensure the entry remains valid long enough to
program the device pte and hasn't been invalidated.

However without a way of filtering any invalidations will result in a
retry, but make_device_exclusive_range() needs to do an invalidation
during installation of the entry. To avoid this causing infinite retries
the driver ignores specific invalidation events that it knows don't
apply, ie. the invalidations that are a result of that driver asking for
device exclusive entries.
OK I think I get it now.. so the driver checks both EXCLUSIVE and owner, if
all match it skips the notify, otherwise it's treated like all the rest. 
Thanks.

However then it's still confusing (as I raised it too in previous comment)
that we use CLEAR when re-installing the valid pte.  It's merely against
what CLEAR means.
Oh, thanks. I understand where you are coming from now - the pte is already 
invalid so ordinarily wouldn't need clearing.
How about sending EXCLUSIVE for both mark/restore?  Just that when restore
we notify with owner==NULL telling that no one is owning it anymore so
driver needs to drop the ownership.  I assume your driver patch does not
need change too.  Would that be much cleaner than CLEAR?  I bet it also
makes commenting the new notify easier.

What do you think?
That seems like a good and avoids adding another type. And as you say they 
driver patch shouldn't need changing either (will need to confirm though).
 
[...]
quoted
quoted
quoted
+                                   vma->vm_mm, address,
min(vma->vm_end,
+                                   address + page_size(page)),
args->owner); +     mmu_notifier_invalidate_range_start(&range);
+
+     while (page_vma_mapped_walk(&pvmw)) {
+             /* Unexpected PMD-mapped THP? */
+             VM_BUG_ON_PAGE(!pvmw.pte, page);
+
+             if (!pte_present(*pvmw.pte)) {
+                     ret = false;
+                     page_vma_mapped_walk_done(&pvmw);
+                     break;
+             }
+
+             subpage = page - page_to_pfn(page) + pte_pfn(*pvmw.pte);
I see that all pages passed in should be done after FOLL_SPLIT_PMD, so
is
this needed?  Or say, should subpage==page always be true?
Not always, in the case of a thp there are small ptes which will get
device
exclusive entries.
FOLL_SPLIT_PMD will first split the huge thp into smaller pages, then do
follow_page_pte() on them (in follow_pmd_mask):

        if (flags & FOLL_SPLIT_PMD) {
                int ret;
                page = pmd_page(*pmd);
                if (is_huge_zero_page(page)) {
                        spin_unlock(ptl);
                        ret = 0;
                        split_huge_pmd(vma, pmd, address);
                        if (pmd_trans_unstable(pmd))
                                ret = -EBUSY;
                } else {
                        spin_unlock(ptl);
                        split_huge_pmd(vma, pmd, address);
                        ret = pte_alloc(mm, pmd) ? -ENOMEM : 0;
                }

                return ret ? ERR_PTR(ret) :
                        follow_page_pte(vma, address, pmd, flags,
&ctx->pgmap); }

So I thought all pages are small pages?
The page will remain as a transparent huge page though (at least as I 
understand things). FOLL_SPLIT_PMD turns it into a pte mapped thp by splitting 
the pmd and creating pte's mapping the subpages but doesn't split the page 
itself. For comparison FOLL_SPLIT (which has been removed in v5.13 due to lack 
of use) is what would be used to split the page in the above GUP code by 
calling split_huge_page() rather than split_huge_pmd().

This was done to avoid adding code for handling device exclusive entries at 
the pmd level as well which would have made the changes more complicated and 
seems unnecessary at least for now.
--
Peter Xu


Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help