Re: [PATCH v8 24/26] mm/gup: track FOLL_PIN pages
From: Jan Kara <jack@suse.cz>
Date: 2019-12-10 13:39:48
Also in:
bpf, dri-devel, kvm, linux-block, linux-fsdevel, linux-kselftest, linux-media, linux-mm, linux-rdma, linuxppc-dev, lkml, netdev
On Mon 09-12-19 14:53:42, John Hubbard wrote:
Add tracking of pages that were pinned via FOLL_PIN.
As mentioned in the FOLL_PIN documentation, callers who effectively set
FOLL_PIN are required to ultimately free such pages via unpin_user_page().
The effect is similar to FOLL_GET, and may be thought of as "FOLL_GET
for DIO and/or RDMA use".
Pages that have been pinned via FOLL_PIN are identifiable via a
new function call:
bool page_dma_pinned(struct page *page);
What to do in response to encountering such a page, is left to later
patchsets. There is discussion about this in [1], [2], and [3].
This also changes a BUG_ON(), to a WARN_ON(), in follow_page_mask().
[1] Some slow progress on get_user_pages() (Apr 2, 2019):
https://lwn.net/Articles/784574/
[2] DMA and get_user_pages() (LPC: Dec 12, 2018):
https://lwn.net/Articles/774411/
[3] The trouble with get_user_pages() (Apr 30, 2018):
https://lwn.net/Articles/753027/
Suggested-by: Jan Kara <jack@suse.cz>
Suggested-by: Jérôme Glisse <redacted>
Signed-off-by: John Hubbard <jhubbard@nvidia.com>Looks nice, some comments below...
+/*
+ * try_grab_compound_head() - attempt to elevate a page's refcount, by a
+ * flags-dependent amount.
+ *
+ * This has a default assumption of "use FOLL_GET behavior, if FOLL_PIN is not
+ * set".
+ *
+ * "grab" names in this file mean, "look at flags to decide with to use FOLL_PIN
+ * or FOLL_GET behavior, when incrementing the page's refcount.
+ */
+static struct page *try_grab_compound_head(struct page *page, int refs,
+ unsigned int flags)
+{
+ if (flags & FOLL_PIN)
+ return try_pin_compound_head(page, refs);
+
+ return try_get_compound_head(page, refs);
+}
+
+/**
+ * grab_page() - elevate a page's refcount by a flag-dependent amount
+ *
+ * This might not do anything at all, depending on the flags argument.
+ *
+ * "grab" names in this file mean, "look at flags to decide with to use FOLL_PIN^^^ whether
+ * or FOLL_GET behavior, when incrementing the page's refcount.
+ *
+ * @page: pointer to page to be grabbed
+ * @flags: gup flags: these are the FOLL_* flag values.
+ *
+ * Either FOLL_PIN or FOLL_GET (or neither) may be set, but not both at the same
+ * time. (That's true throughout the get_user_pages*() and pin_user_pages*()
+ * APIs.) Cases:
+ *
+ * FOLL_GET: page's refcount will be incremented by 1.
+ * FOLL_PIN: page's refcount will be incremented by GUP_PIN_COUNTING_BIAS.
+ */
+void grab_page(struct page *page, unsigned int flags)
+{
+ if (flags & FOLL_GET)
+ get_page(page);
+ else if (flags & FOLL_PIN) {
+ get_page(page);
+ WARN_ON_ONCE(flags & FOLL_GET);
+ /*
+ * Use get_page(), above, to do the refcount error
+ * checking. Then just add in the remaining references:
+ */
+ page_ref_add(page, GUP_PIN_COUNTING_BIAS - 1);This is wrong for two reasons: 1) You miss compound_head() indirection from get_page() for this page_ref_add(). 2) page_ref_add() could overflow the counter without noticing. Especially with GUP_PIN_COUNTING_BIAS being non-trivial, it is realistic that an attacker might try to overflow the page refcount and we have to protect the kernel against that. So I think that all the places that would use grab_page() actually need to use try_grab_page() and then gracefully deal with the failure.
quoted hunk ↗ jump to hunk
@@ -278,11 +425,23 @@ static struct page *follow_page_pte(struct vm_area_struct *vma, goto retry; } - if (flags & FOLL_GET) { + if (flags & (FOLL_PIN | FOLL_GET)) { + /* + * Allow try_get_page() to take care of error handling, for + * both cases: FOLL_GET or FOLL_PIN: + */ if (unlikely(!try_get_page(page))) { page = ERR_PTR(-ENOMEM); goto out; } + + if (flags & FOLL_PIN) { + WARN_ON_ONCE(flags & FOLL_GET); + + /* We got a +1 refcount from try_get_page(), above. */ + page_ref_add(page, GUP_PIN_COUNTING_BIAS - 1); + __update_proc_vmstat(page, NR_FOLL_PIN_REQUESTED, 1); + } }
The same problem here as above, plus this place should use the same try_grab..() helper, shouldn't it?
quoted hunk ↗ jump to hunk
@@ -544,8 +703,8 @@ static struct page *follow_page_mask(struct vm_area_struct *vma, /* make this handle hugepd */ page = follow_huge_addr(mm, address, flags & FOLL_WRITE); if (!IS_ERR(page)) { - BUG_ON(flags & FOLL_GET); - return page; + WARN_ON_ONCE(flags & (FOLL_GET | FOLL_PIN)); + return NULL;
I agree with the change to WARN_ON_ONCE but why is correct the change of the return value? Note that this is actually a "success branch". Honza -- Jan Kara [off-list ref] SUSE Labs, CR