Re: [PATCH v16 16/22] mm/mlock: reorder isolation sequence during munlock

[PATCH v16 00/22] per memcg lru_lock · Alex Shi <hidden> · 2020-07-11
[PATCH v16 01/22] mm/vmscan: remove unnecessary lruvec adding · Alex Shi <hidden> · 2020-07-11
[PATCH v16 03/22] mm/compaction: correct the comments of compact_defer_shift · Alex Shi <hidden> · 2020-07-11
[PATCH v16 05/22] mm/thp: move lru_add_page_tail func to huge_memory.c · Alex Shi <hidden> · 2020-07-11
Re: [PATCH v16 05/22] mm/thp: move lru_add_page_tail func to huge_memory.c · Alex Shi <hidden> · 2020-07-16
Re: [PATCH v16 05/22] mm/thp: move lru_add_page_tail func to huge_memory.c · Kirill A. Shutemov <hidden> · 2020-07-16
Re: [PATCH v16 05/22] mm/thp: move lru_add_page_tail func to huge_memory.c · Alex Shi <hidden> · 2020-07-17
Re: [PATCH v16 05/22] mm/thp: move lru_add_page_tail func to huge_memory.c · Kirill A. Shutemov <hidden> · 2020-07-20
[PATCH v16 16/22] mm/mlock: reorder isolation sequence during munlock · Alex Shi <hidden> · 2020-07-11
Re: [PATCH v16 16/22] mm/mlock: reorder isolation sequence during munlock · Alexander Duyck <hidden> · 2020-07-17
Re: [PATCH v16 16/22] mm/mlock: reorder isolation sequence during munlock · Alex Shi <hidden> · 2020-07-19
Re: [PATCH v16 16/22] mm/mlock: reorder isolation sequence during munlock · Alexander Duyck <hidden> · 2020-07-20
Re: [PATCH v16 16/22] mm/mlock: reorder isolation sequence during munlock · Alex Shi <hidden> · 2020-07-21
Re: [PATCH v16 16/22] mm/mlock: reorder isolation sequence during munlock · Alex Shi <hidden> · 2020-07-21
[PATCH v16 19/22] mm/lru: introduce the relock_page_lruvec function · Alex Shi <hidden> · 2020-07-11
Re: [PATCH v16 19/22] mm/lru: introduce the relock_page_lruvec function · Alexander Duyck <hidden> · 2020-07-17
Re: [PATCH v16 19/22] mm/lru: introduce the relock_page_lruvec function · Alex Shi <hidden> · 2020-07-18
[PATCH v16 18/22] mm/lru: replace pgdat lru_lock with lruvec lock · Alex Shi <hidden> · 2020-07-11
Re: [PATCH v16 18/22] mm/lru: replace pgdat lru_lock with lruvec lock · Alexander Duyck <hidden> · 2020-07-17
Re: [PATCH v16 18/22] mm/lru: replace pgdat lru_lock with lruvec lock · Alex Shi <hidden> · 2020-07-18
Re: [PATCH v16 18/22] mm/lru: replace pgdat lru_lock with lruvec lock · Alex Shi <hidden> · 2020-07-19
Re: [PATCH v16 18/22] mm/lru: replace pgdat lru_lock with lruvec lock · Alexander Duyck <hidden> · 2020-07-19
Re: [PATCH v16 18/22] mm/lru: replace pgdat lru_lock with lruvec lock · Alex Shi <hidden> · 2020-07-20
[PATCH v16 15/22] mm/compaction: do page isolation first in compaction · Alex Shi <hidden> · 2020-07-11
Re: [PATCH v16 15/22] mm/compaction: do page isolation first in compaction · Alexander Duyck <hidden> · 2020-07-16
Re: [PATCH v16 15/22] mm/compaction: do page isolation first in compaction · Alex Shi <hidden> · 2020-07-17
Re: [PATCH v16 15/22] mm/compaction: do page isolation first in compaction · Alexander Duyck <hidden> · 2020-07-17
Re: [PATCH v16 15/22] mm/compaction: do page isolation first in compaction · Alex Shi <hidden> · 2020-07-19
[PATCH v16 20/22] mm/vmscan: use relock for move_pages_to_lru · Alex Shi <hidden> · 2020-07-11
Re: [PATCH v16 20/22] mm/vmscan: use relock for move_pages_to_lru · Alexander Duyck <hidden> · 2020-07-17
Re: [PATCH v16 20/22] mm/vmscan: use relock for move_pages_to_lru · Alex Shi <hidden> · 2020-07-18
[PATCH v16 10/22] mm/swap: fold vm event PGROTATED into pagevec_move_tail_fn · Alex Shi <hidden> · 2020-07-11
[PATCH v16 22/22] mm/lru: revise the comments of lru_lock · Alex Shi <hidden> · 2020-07-11
[PATCH v16 13/22] mm/lru: introduce TestClearPageLRU · Alex Shi <hidden> · 2020-07-11
Re: [PATCH v16 13/22] mm/lru: introduce TestClearPageLRU · Alex Shi <hidden> · 2020-07-16
Re: [PATCH v16 13/22] mm/lru: introduce TestClearPageLRU · Alexander Duyck <hidden> · 2020-07-16
Re: [PATCH v16 13/22] mm/lru: introduce TestClearPageLRU · Alex Shi <hidden> · 2020-07-17
Re: [PATCH v16 13/22] mm/lru: introduce TestClearPageLRU · Alexander Duyck <hidden> · 2020-07-17
Re: [PATCH v16 13/22] mm/lru: introduce TestClearPageLRU · Alex Shi <hidden> · 2020-07-19
Re: [PATCH v16 13/22] mm/lru: introduce TestClearPageLRU · Alex Shi <hidden> · 2020-07-19
[PATCH v16 07/22] mm/thp: remove code path which never got into · Alex Shi <hidden> · 2020-07-11
Re: [PATCH v16 07/22] mm/thp: remove code path which never got into · Kirill A. Shutemov <hidden> · 2020-07-20
[PATCH v16 21/22] mm/pgdat: remove pgdat lru_lock · Alex Shi <hidden> · 2020-07-11
Re: [PATCH v16 21/22] mm/pgdat: remove pgdat lru_lock · Alexander Duyck <hidden> · 2020-07-17
Re: [PATCH v16 21/22] mm/pgdat: remove pgdat lru_lock · Alex Shi <hidden> · 2020-07-18
[PATCH v16 17/22] mm/swap: serialize memcg changes during pagevec_lru_move_fn · Alex Shi <hidden> · 2020-07-11
[PATCH v16 14/22] mm/thp: add tail pages into lru anyway in split_huge_page() · Alex Shi <hidden> · 2020-07-11
Re: [PATCH v16 14/22] mm/thp: add tail pages into lru anyway in split_huge_page() · Alex Shi <hidden> · 2020-07-17
Re: [PATCH v16 14/22] mm/thp: add tail pages into lru anyway in split_huge_page() · Kirill A. Shutemov <hidden> · 2020-07-20
Re: [PATCH v16 14/22] mm/thp: add tail pages into lru anyway in split_huge_page() · Alex Shi <hidden> · 2020-07-20
[PATCH v16 06/22] mm/thp: clean up lru_add_page_tail · Alex Shi <hidden> · 2020-07-11
Re: [PATCH v16 06/22] mm/thp: clean up lru_add_page_tail · Kirill A. Shutemov <hidden> · 2020-07-20
[PATCH v16 12/22] mm/lru: move lock into lru_note_cost · Alex Shi <hidden> · 2020-07-11
[PATCH v16 11/22] mm/lru: move lru_lock holding in func lru_note_cost_page · Alex Shi <hidden> · 2020-07-11
[PATCH v16 09/22] mm/memcg: add debug checking in lock_page_memcg · Alex Shi <hidden> · 2020-07-11
[PATCH v16 04/22] mm/compaction: rename compact_deferred as compact_should_defer · Alex Shi <hidden> · 2020-07-11
[PATCH v16 08/22] mm/thp: narrow lru locking · Alex Shi <hidden> · 2020-07-11
[PATCH v16 02/22] mm/page_idle: no unlikely double check for idle page counting · Alex Shi <hidden> · 2020-07-11
Re: [PATCH v16 00/22] per memcg lru_lock · Alex Shi <hidden> · 2020-07-11
Re: [PATCH v16 00/22] per memcg lru_lock · Alex Shi <hidden> · 2020-07-16
Re: [PATCH v16 00/22] per memcg lru_lock · Alexander Duyck <hidden> · 2020-07-16
Re: [PATCH v16 00/22] per memcg lru_lock · Alex Shi <hidden> · 2020-07-17
Re: [PATCH v16 00/22] per memcg lru_lock · Hugh Dickins <hughd@google.com> · 2020-07-19
Re: [PATCH v16 00/22] per memcg lru_lock · Alex Shi <hidden> · 2020-07-20
Re: [PATCH v16 00/22] per memcg lru_lock · Hugh Dickins <hughd@google.com> · 2020-07-20
Re: [PATCH v16 00/22] per memcg lru_lock · Alex Shi <hidden> · 2020-07-20

From: Alex Shi <hidden>
Date: 2020-07-21 13:51:36
Also in: linux-mm, lkml


在 2020/7/21 下午5:26, Alex Shi 写道:


在 2020/7/21 上午2:51, Alexander Duyck 写道:

quoted

Look into the __split_huge_page_tail, there is a tiny gap between tail page
get PG_mlocked, and it is added into lru list.
The TestClearPageLRU could blocked memcg changes of the page from stopping
isolate_lru_page.

I get that there is a gap between the two in __split_huge_page_tail.
My concern is more the fact that you are pulling the bit testing
outside of the locked region when I don't think it needs to be. The
lock is being taken unconditionally, so why pull the testing out when
you could just do it inside the lock anyway? My worry is that you
might be addressing __split_huge_page_tail but in the process you
might be introducing a new race with something like
__pagevec_lru_add_fn.

Yes, the page maybe interfered by clear_page_mlock and add pages to wrong lru
list.

quoted

If I am not mistaken the Mlocked flag can still be cleared regardless
of if the LRU bit is set or not. So you can still clear the LRU bit
before you pull the page out of the list, but it can be done after
clearing the Mlocked flag instead of before you have even taken the
LRU lock. In that way it would function more similar to how you
handled pagevec_lru_move_fn() as all this function is really doing is
moving the pages out of the unevictable list into one of the other LRU
lists anyway since the Mlocked flag was cleared.

Without the lru bit guard, the page may be moved between memcgs, luckly,
lock_page would stop the mem_cgroup_move_account with BUSY state cost.
whole new change would like the following, I will testing/resend again.

Hi Johannes,

It looks like lock_page_memcg() could be used to replace lock_page(), which
could change retry into spinlock wait. Would you like to give some comments?

Thank
Alex

quoted hunk ↗ jump to hunk

Thanks!
Alex

@@ -182,7 +179,7 @@ static void __munlock_isolation_failed(struct page *page)
 unsigned int munlock_vma_page(struct page *page)
 {
        int nr_pages;
-       pg_data_t *pgdat = page_pgdat(page);
+       struct lruvec *lruvec;

        /* For try_to_munlock() and to serialize with page migration */
        BUG_ON(!PageLocked(page));
@@ -190,11 +187,11 @@ unsigned int munlock_vma_page(struct page *page)
        VM_BUG_ON_PAGE(PageTail(page), page);

        /*
-        * Serialize with any parallel __split_huge_page_refcount() which
+        * Serialize split tail pages in __split_huge_page_tail() which
         * might otherwise copy PageMlocked to part of the tail pages before
         * we clear it in the head page. It also stabilizes hpage_nr_pages().
         */
-       spin_lock_irq(&pgdat->lru_lock);
+       lruvec = lock_page_lruvec_irq(page);

        if (!TestClearPageMlocked(page)) {
                /* Potentially, PTE-mapped THP: do not skip the rest PTEs */
@@ -205,15 +202,15 @@ unsigned int munlock_vma_page(struct page *page)
        nr_pages = hpage_nr_pages(page);
        __mod_zone_page_state(page_zone(page), NR_MLOCK, -nr_pages);

-       if (__munlock_isolate_lru_page(page, true)) {
-               spin_unlock_irq(&pgdat->lru_lock);
+       if (__munlock_isolate_lru_page(page, lruvec, true)) {
+               unlock_page_lruvec_irq(lruvec);
                __munlock_isolated_page(page);
                goto out;
        }
        __munlock_isolation_failed(page);

 unlock_out:
-       spin_unlock_irq(&pgdat->lru_lock);
+       unlock_page_lruvec_irq(lruvec);

 out:
        return nr_pages - 1;
@@ -293,23 +290,27 @@ static void __munlock_pagevec(struct pagevec *pvec, struct zone *zone)
        int nr = pagevec_count(pvec);
        int delta_munlocked = -nr;
        struct pagevec pvec_putback;
+       struct lruvec *lruvec = NULL;
        int pgrescued = 0;

        pagevec_init(&pvec_putback);

        /* Phase 1: page isolation */
-       spin_lock_irq(&zone->zone_pgdat->lru_lock);
        for (i = 0; i < nr; i++) {
                struct page *page = pvec->pages[i];

+               /* block memcg change in mem_cgroup_move_account */
+               lock_page(page);
+               lruvec = relock_page_lruvec_irq(page, lruvec);
                if (TestClearPageMlocked(page)) {
                        /*
                         * We already have pin from follow_page_mask()
                         * so we can spare the get_page() here.
                         */
-                       if (__munlock_isolate_lru_page(page, false))
+                       if (__munlock_isolate_lru_page(page, lruvec, false)) {
+                               unlock_page(page);
                                continue;
-                       else
+                       } else
                                __munlock_isolation_failed(page);
                } else {
                        delta_munlocked++;
@@ -321,11 +322,14 @@ static void __munlock_pagevec(struct pagevec *pvec, struct zone *zone)
                 * pin. We cannot do it under lru_lock however. If it's
                 * the last pin, __page_cache_release() would deadlock.
                 */
+               unlock_page(page);
                pagevec_add(&pvec_putback, pvec->pages[i]);
                pvec->pages[i] = NULL;
        }
-       __mod_zone_page_state(zone, NR_MLOCK, delta_munlocked);
-       spin_unlock_irq(&zone->zone_pgdat->lru_lock);
+       if (lruvec) {
+               __mod_zone_page_state(zone, NR_MLOCK, delta_munlocked);
+               unlock_page_lruvec_irq(lruvec);
+       }

        /* Now we can release pins of pages that we are not munlocking */
        pagevec_release(&pvec_putback);

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help