Re: [PATCH v4 4/7] mm: make alloc_contig_range work at pageblock granularity
From: Oscar Salvador <osalvador@suse.de>
Date: 2022-02-04 13:56:58
Also in:
linux-iommu, linux-mm, lkml
On Wed, Jan 19, 2022 at 02:06:20PM -0500, Zi Yan wrote:
From: Zi Yan <ziy@nvidia.com> alloc_contig_range() worked at MAX_ORDER-1 granularity to avoid merging pageblocks with different migratetypes. It might unnecessarily convert extra pageblocks at the beginning and at the end of the range. Change alloc_contig_range() to work at pageblock granularity. It is done by restoring pageblock types and split >pageblock_order free pages after isolating at MAX_ORDER-1 granularity and migrating pages away at pageblock granularity. The reason for this process is that during isolation, some pages, either free or in-use, might have >pageblock sizes and isolating part of them can cause free accounting issues. Restoring the migratetypes of the pageblocks not in the interesting range later is much easier.
Hi Zi Yan, Due to time constraints I only glanced over, so some comments below about stuff that caught my eye:
+static inline void split_free_page_into_pageblocks(struct page *free_page,
+ int order, struct zone *zone)
+{
+ unsigned long pfn;
+
+ spin_lock(&zone->lock);
+ del_page_from_free_list(free_page, zone, order);
+ for (pfn = page_to_pfn(free_page);
+ pfn < page_to_pfn(free_page) + (1UL << order);It migt make sense to have a end_pfn variable so that does not have to be constantly evaluated. Or maybe the compiler is clever enough to only evualuate it once.
+ pfn += pageblock_nr_pages) {
+ int mt = get_pfnblock_migratetype(pfn_to_page(pfn), pfn);
+
+ __free_one_page(pfn_to_page(pfn), pfn, zone, pageblock_order,
+ mt, FPI_NONE);
+ }
+ spin_unlock(&zone->lock);It is possible that free_page's order is already pageblock_order, so I would add a one-liner upfront to catch that case and return, otherwise we do the delete_from_freelist-and-free_it_again dance.
+ /* Save the migratepages of the pageblocks before start and after end */ + num_pageblock_to_save = (alloc_start - isolate_start) / pageblock_nr_pages + + (isolate_end - alloc_end) / pageblock_nr_pages; + saved_mt = + kmalloc_array(num_pageblock_to_save, + sizeof(unsigned char), GFP_KERNEL); + if (!saved_mt) + return -ENOMEM; + + num = save_migratetypes(saved_mt, isolate_start, alloc_start); + + num = save_migratetypes(&saved_mt[num], alloc_end, isolate_end);
I really hope we can put all this magic within start_isolate_page_range, and the counterparts in undo_isolate_page_range. Also, I kinda dislike the &saved_mt thing. I thought about some other approaches but nothing that wasn't too specific for this case, and I guess we want that function to be as generic as possible.
+ /*
+ * Split free page spanning [alloc_end, isolate_end) and put the
+ * pageblocks in the right migratetype list
+ */
+ for (outer_end = alloc_end; outer_end < isolate_end;) {
+ unsigned long begin_pfn = outer_end;
+
+ order = 0;
+ while (!PageBuddy(pfn_to_page(outer_end))) {
+ if (++order >= MAX_ORDER) {
+ outer_end = begin_pfn;
+ break;
+ }
+ outer_end &= ~0UL << order;
+ }
+
+ if (outer_end != begin_pfn) {
+ order = buddy_order(pfn_to_page(outer_end));
+
+ /*
+ * split the free page has start page and put the pageblocks
+ * in the right migratetype list
+ */
+ VM_BUG_ON(outer_end + (1UL << order) <= begin_pfn);How could this possibily happen?
+ {
+ struct page *free_page = pfn_to_page(outer_end);
+
+ split_free_page_into_pageblocks(free_page, order, cc.zone);
+ }
+ outer_end += 1UL << order;
+ } else
+ outer_end = begin_pfn + 1;
}I think there are cases could optimize for. If the page has already been split in pageblock by the outer_start loop, we could skip this outer_end logic altogether. E.g: An order-10 page is split in two pageblocks. There's nothing else to be done, right? We could skip this. -- Oscar Salvador SUSE Labs