Thread (122 messages) 122 messages, 11 authors, 2009-09-13

Re: [RFC] respect the referenced bit of KVM guest pages?

From: Minchan Kim <hidden>
Date: 2009-08-18 11:00:49
Also in: lkml
Subsystem: memory management, memory management - mglru (multi-gen lru), memory management - reclaim, memory management - rmap (reverse mapping), the rest · Maintainers: Andrew Morton, Johannes Weiner, David Hildenbrand, Lorenzo Stoakes, Linus Torvalds

On Tue, Aug 18, 2009 at 7:00 PM, Wu Fengguang[off-list ref] wrote:
On Tue, Aug 18, 2009 at 05:52:47PM +0800, Minchan Kim wrote:
quoted
On Tue, 18 Aug 2009 17:31:19 +0800
Wu Fengguang [off-list ref] wrote:
quoted
On Tue, Aug 18, 2009 at 12:17:34PM +0800, Minchan Kim wrote:
quoted
On Tue, 18 Aug 2009 10:34:38 +0800
Wu Fengguang [off-list ref] wrote:
quoted
Minchan,

On Mon, Aug 17, 2009 at 10:33:54PM +0800, Minchan Kim wrote:
quoted
On Sun, Aug 16, 2009 at 8:29 PM, Wu Fengguang[off-list ref] wrote:
quoted
On Sun, Aug 16, 2009 at 01:15:02PM +0800, Wu Fengguang wrote:
quoted
On Sun, Aug 16, 2009 at 11:53:00AM +0800, Rik van Riel wrote:
quoted
Wu Fengguang wrote:
quoted
On Fri, Aug 07, 2009 at 05:09:55AM +0800, Jeff Dike wrote:
quoted
Side question -
 Is there a good reason for this to be in shrink_active_list()
as opposed to __isolate_lru_page?

         if (unlikely(!page_evictable(page, NULL))) {
                 putback_lru_page(page);
                 continue;
         }

Maybe we want to minimize the amount of code under the lru lock or
avoid duplicate logic in the isolate_page functions.
I guess the quick test means to avoid the expensive page_referenced()
call that follows it. But that should be mostly one shot cost - the
unevictable pages are unlikely to cycle in active/inactive list again
and again.
Please read what putback_lru_page does.

It moves the page onto the unevictable list, so that
it will not end up in this scan again.
Yes it does. I said 'mostly' because there is a small hole that an
unevictable page may be scanned but still not moved to unevictable
list: when a page is mapped in two places, the first pte has the
referenced bit set, the _second_ VMA has VM_LOCKED bit set, then
page_referenced() will return 1 and shrink_page_list() will move it
into active list instead of unevictable list. Shall we fix this rare
case?
I think it's not a big deal.
Maybe, otherwise I should bring up this issue long time before :)
quoted
As you mentioned, it's rare case so there would be few pages in active
list instead of unevictable list.
Yes.
quoted
When next time to scan comes, we can try to move the pages into
unevictable list, again.
Will PG_mlocked be set by then? Otherwise the situation is not likely
to change and the VM_LOCKED pages may circulate in active/inactive
list for countless times.
PG_mlocked is not important in that case.
Important thing is VM_LOCKED vma.
I think below annotaion can help you to understand my point. :)
Hmm, it looks like pages under VM_LOCKED vma is guaranteed to have
PG_mlocked set, and so will be caught by page_evictable(). Is it?
No. I am sorry for making my point not clear.
I meant following as.
When the next time to scan,

shrink_page_list
 ->
               referenced = page_referenced(page, 1,
                                               sc->mem_cgroup, &vm_flags);
               /* In active use or really unfreeable?  Activate it. */
               if (sc->order <= PAGE_ALLOC_COSTLY_ORDER &&
                                       referenced && page_mapping_inuse(page))
                       goto activate_locked;
quoted
-> try_to_unmap
    ~~~~~~~~~~~~ this line won't be reached if page is found to be
    referenced in the above lines?
Indeed! In fact, I was worry about that.
It looks after live lock problem.
But I think  it's very small race window so  there isn't any report until now.
Let's Cced Lee.

If we have to fix it, how about this ?
This version  has small overhead than yours since
there is less shrink_page_list call than page_referenced.
diff --git a/mm/rmap.c b/mm/rmap.c
index ed63894..283266c 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -358,6 +358,7 @@ static int page_referenced_one(struct page *page,
         */
        if (vma->vm_flags & VM_LOCKED) {
                *mapcount = 1;  /* break early from loop */
+               *vm_flags |= VM_LOCKED;
                goto out_unmap;
        }
diff --git a/mm/vmscan.c b/mm/vmscan.c
index d224b28..d156e1d 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -632,7 +632,8 @@ static unsigned long shrink_page_list(struct
list_head *page_list,
                                                sc->mem_cgroup, &vm_flags);
                /* In active use or really unfreeable?  Activate it. */
                if (sc->order <= PAGE_ALLOC_COSTLY_ORDER &&
-                                       referenced && page_mapping_inuse(page))
+                                       referenced && page_mapping_inuse(page)
+                                       && !(vm_flags & VM_LOCKED))
                        goto activate_locked;



Thanks,
Fengguang
quoted
      -> try_to_unmap_xxx
              -> if (vma->vm_flags & VM_LOCKED)
              -> try_to_mlock_page
                      -> TestSetPageMlocked
                      -> putback_lru_page

So at last, the page will be located in unevictable list.
quoted
Then I was worrying about a null problem. Sorry for the confusion!

Thanks,
Fengguang
quoted
----

/*
 * called from munlock()/munmap() path with page supposedly on the LRU.
 *
 * Note:  unlike mlock_vma_page(), we can't just clear the PageMlocked
 * [in try_to_munlock()] and then attempt to isolate the page.  We must
 * isolate the page to keep others from messing with its unevictable
 * and mlocked state while trying to munlock.  However, we pre-clear the
 * mlocked state anyway as we might lose the isolation race and we might
 * not get another chance to clear PageMlocked.  If we successfully
 * isolate the page and try_to_munlock() detects other VM_LOCKED vmas
 * mapping the page, it will restore the PageMlocked state, unless the page
 * is mapped in a non-linear vma.  So, we go ahead and SetPageMlocked(),
 * perhaps redundantly.
 * If we lose the isolation race, and the page is mapped by other VM_LOCKED
 * vmas, we'll detect this in vmscan--via try_to_munlock() or try_to_unmap()
 * either of which will restore the PageMlocked state by calling
 * mlock_vma_page() above, if it can grab the vma's mmap sem.
 */
static void munlock_vma_page(struct page *page)
{
...

--
Kind regards,
Minchan Kim

--
Kind regards,
Minchan Kim


-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help