Re: [RFC] respect the referenced bit of KVM guest pages?
From: Minchan Kim <hidden>
Date: 2009-08-18 11:00:49
Also in:
lkml
Subsystem:
memory management, memory management - mglru (multi-gen lru), memory management - reclaim, memory management - rmap (reverse mapping), the rest · Maintainers:
Andrew Morton, Johannes Weiner, David Hildenbrand, Lorenzo Stoakes, Linus Torvalds
On Tue, Aug 18, 2009 at 7:00 PM, Wu Fengguang[off-list ref] wrote:
On Tue, Aug 18, 2009 at 05:52:47PM +0800, Minchan Kim wrote:quoted
On Tue, 18 Aug 2009 17:31:19 +0800 Wu Fengguang [off-list ref] wrote:quoted
On Tue, Aug 18, 2009 at 12:17:34PM +0800, Minchan Kim wrote:quoted
On Tue, 18 Aug 2009 10:34:38 +0800 Wu Fengguang [off-list ref] wrote:quoted
Minchan, On Mon, Aug 17, 2009 at 10:33:54PM +0800, Minchan Kim wrote:quoted
On Sun, Aug 16, 2009 at 8:29 PM, Wu Fengguang[off-list ref] wrote:quoted
On Sun, Aug 16, 2009 at 01:15:02PM +0800, Wu Fengguang wrote:quoted
On Sun, Aug 16, 2009 at 11:53:00AM +0800, Rik van Riel wrote:quoted
Wu Fengguang wrote:quoted
On Fri, Aug 07, 2009 at 05:09:55AM +0800, Jeff Dike wrote:quoted
Side question - Is there a good reason for this to be in shrink_active_list() as opposed to __isolate_lru_page? if (unlikely(!page_evictable(page, NULL))) { putback_lru_page(page); continue; } Maybe we want to minimize the amount of code under the lru lock or avoid duplicate logic in the isolate_page functions.I guess the quick test means to avoid the expensive page_referenced() call that follows it. But that should be mostly one shot cost - the unevictable pages are unlikely to cycle in active/inactive list again and again.Please read what putback_lru_page does. It moves the page onto the unevictable list, so that it will not end up in this scan again.Yes it does. I said 'mostly' because there is a small hole that an unevictable page may be scanned but still not moved to unevictable list: when a page is mapped in two places, the first pte has the referenced bit set, the _second_ VMA has VM_LOCKED bit set, then page_referenced() will return 1 and shrink_page_list() will move it into active list instead of unevictable list. Shall we fix this rare case?I think it's not a big deal.Maybe, otherwise I should bring up this issue long time before :)quoted
As you mentioned, it's rare case so there would be few pages in active list instead of unevictable list.Yes.quoted
When next time to scan comes, we can try to move the pages into unevictable list, again.Will PG_mlocked be set by then? Otherwise the situation is not likely to change and the VM_LOCKED pages may circulate in active/inactive list for countless times.PG_mlocked is not important in that case. Important thing is VM_LOCKED vma. I think below annotaion can help you to understand my point. :)Hmm, it looks like pages under VM_LOCKED vma is guaranteed to have PG_mlocked set, and so will be caught by page_evictable(). Is it?No. I am sorry for making my point not clear. I meant following as. When the next time to scan, shrink_page_list-> referenced = page_referenced(page, 1, sc->mem_cgroup, &vm_flags); /* In active use or really unfreeable? Activate it. */ if (sc->order <= PAGE_ALLOC_COSTLY_ORDER && referenced && page_mapping_inuse(page)) goto activate_locked;quoted
-> try_to_unmap~~~~~~~~~~~~ this line won't be reached if page is found to be referenced in the above lines?
Indeed! In fact, I was worry about that. It looks after live lock problem. But I think it's very small race window so there isn't any report until now. Let's Cced Lee. If we have to fix it, how about this ? This version has small overhead than yours since there is less shrink_page_list call than page_referenced.
diff --git a/mm/rmap.c b/mm/rmap.c
index ed63894..283266c 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c@@ -358,6 +358,7 @@ static int page_referenced_one(struct page *page, */ if (vma->vm_flags & VM_LOCKED) { *mapcount = 1; /* break early from loop */ + *vm_flags |= VM_LOCKED; goto out_unmap; }
diff --git a/mm/vmscan.c b/mm/vmscan.c
index d224b28..d156e1d 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c@@ -632,7 +632,8 @@ static unsigned long shrink_page_list(structlist_head *page_list,
sc->mem_cgroup, &vm_flags);
/* In active use or really unfreeable? Activate it. */
if (sc->order <= PAGE_ALLOC_COSTLY_ORDER &&
- referenced && page_mapping_inuse(page))
+ referenced && page_mapping_inuse(page)
+ && !(vm_flags & VM_LOCKED))
goto activate_locked;
Thanks, Fengguangquoted
-> try_to_unmap_xxx -> if (vma->vm_flags & VM_LOCKED) -> try_to_mlock_page -> TestSetPageMlocked -> putback_lru_page So at last, the page will be located in unevictable list.quoted
Then I was worrying about a null problem. Sorry for the confusion! Thanks, Fengguangquoted
---- /* * called from munlock()/munmap() path with page supposedly on the LRU. * * Note: unlike mlock_vma_page(), we can't just clear the PageMlocked * [in try_to_munlock()] and then attempt to isolate the page. We must * isolate the page to keep others from messing with its unevictable * and mlocked state while trying to munlock. However, we pre-clear the * mlocked state anyway as we might lose the isolation race and we might * not get another chance to clear PageMlocked. If we successfully * isolate the page and try_to_munlock() detects other VM_LOCKED vmas * mapping the page, it will restore the PageMlocked state, unless the page * is mapped in a non-linear vma. So, we go ahead and SetPageMlocked(), * perhaps redundantly. * If we lose the isolation race, and the page is mapped by other VM_LOCKED * vmas, we'll detect this in vmscan--via try_to_munlock() or try_to_unmap() * either of which will restore the PageMlocked state by calling * mlock_vma_page() above, if it can grab the vma's mmap sem. */ static void munlock_vma_page(struct page *page) { ... -- Kind regards, Minchan Kim-- Kind regards, Minchan Kim
-- Kind regards, Minchan Kim -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>