Re: [PATCHv6 07/37] filemap: allocate huge page in page_cache_read(), if allowed
From: Matthew Wilcox <willy@infradead.org>
Date: 2017-02-09 21:18:35
Also in:
linux-ext4, linux-fsdevel, linux-mm, lkml
On Thu, Jan 26, 2017 at 02:57:49PM +0300, Kirill A. Shutemov wrote:
Later we can add logic to accumulate information from shadow entires to return to caller (average eviction time?).
I would say minimum rather than average. That will become the refault time of the entire page, so minimum would probably have us making better decisions?
+ /* Wipe shadow entires */
+ radix_tree_for_each_slot(slot, &mapping->page_tree, &iter,
+ page->index) {
+ if (iter.index >= page->index + hpage_nr_pages(page))
+ break;
p = radix_tree_deref_slot_protected(slot, &mapping->tree_lock);
- if (!radix_tree_exceptional_entry(p))
+ if (!p)
+ continue;Just FYI, this can't happen. You're holding the tree lock so nobody else gets to remove things from the tree. radix_tree_for_each_slot() only gives you the full slots; it skips the empty ones for you. I'm OK if you want to leave it in out of an abundance of caution.
+ __radix_tree_replace(&mapping->page_tree, iter.node, slot, NULL, + workingset_update_node, mapping);
I may add an update_node argument to radix_tree_join at some point, so you can use it here. Or maybe we don't need to do that, and what you have here works just fine.
mapping->nrexceptional--;
... because adjusting the exceptional count is going to be a pain.
+ error = __radix_tree_insert(&mapping->page_tree, + page->index, compound_order(page), page); + /* This shouldn't happen */ + if (WARN_ON_ONCE(error)) + return error;
A lesser man would have just ignored the return value from __radix_tree_insert. I salute you.
quoted hunk ↗ jump to hunk
@@ -2078,18 +2155,34 @@ static int page_cache_read(struct file *file, pgoff_t offset, gfp_t gfp_mask) { struct address_space *mapping = file->f_mapping; struct page *page; + pgoff_t hoffset; int ret; do { - page = __page_cache_alloc(gfp_mask|__GFP_COLD); + page = page_cache_alloc_huge(mapping, offset, gfp_mask); +no_huge: + if (!page) + page = __page_cache_alloc(gfp_mask|__GFP_COLD); if (!page) return -ENOMEM; - ret = add_to_page_cache_lru(page, mapping, offset, gfp_mask & GFP_KERNEL); - if (ret == 0) + if (PageTransHuge(page)) + hoffset = round_down(offset, HPAGE_PMD_NR); + else + hoffset = offset; + + ret = add_to_page_cache_lru(page, mapping, hoffset, + gfp_mask & GFP_KERNEL); + + if (ret == -EEXIST && PageTransHuge(page)) { + put_page(page); + page = NULL; + goto no_huge; + } else if (ret == 0) { ret = mapping->a_ops->readpage(file, page); - else if (ret == -EEXIST) + } else if (ret == -EEXIST) { ret = 0; /* losing race to add is OK */ + } put_page(page);
If the filesystem returns AOP_TRUNCATED_PAGE, you'll go round this loop
again trying the huge page again, even if the huge page didn't work
the first time. I would tend to think that if the huge page failed the
first time, we shouldn't try it again, so I propose this:
struct address_space *mapping = file->f_mapping;
struct page *page;
pgoff_t index;
int ret;
bool try_huge = true;
do {
if (try_huge) {
page = page_cache_alloc_huge(gfp_mask|__GFP_COLD);
if (page)
index = round_down(offset, HPAGE_PMD_NR);
else
try_huge = false;
}
if (!try_huge) {
page = __page_cache_alloc(gfp_mask|__GFP_COLD);
index = offset;
}
if (!page)
return -ENOMEM;
ret = add_to_page_cache_lru(page, mapping, index,
gfp_mask & GFP_KERNEL);
if (ret < 0) {
if (try_huge) {
try_huge = false;
ret = AOP_TRUNCATED_PAGE;
} else if (ret == -EEXIST)
ret = 0; /* losing race to add is OK */
} else {
ret = mapping->a_ops->readpage(file, page);
}
put_page(page);
} while (ret == AOP_TRUNCATED_PAGE);
But ... maybe it's OK to retry the huge page. I mean, not many
filesystems return AOP_TRUNCATED_PAGE, and they only do so rarely.
Anyway, I'm fine with the patch going in as-is. I just wanted to type out
my review notes.
Reviewed-by: Matthew Wilcox <redacted>