Re: [PATCH 05/12] powerpc: add pte_free_defer() for pgtables sharing page
From: Hugh Dickins <hughd@google.com>
Date: 2023-06-02 06:38:42
Also in:
linux-s390, lkml, sparclinux
On Thu, 1 Jun 2023, Gerald Schaefer wrote:
On Mon, 29 May 2023 07:36:40 -0700 (PDT) Hugh Dickins [off-list ref] wrote:quoted
On Mon, 29 May 2023, Matthew Wilcox wrote:quoted
On Sun, May 28, 2023 at 11:20:21PM -0700, Hugh Dickins wrote:quoted
+void pte_free_defer(struct mm_struct *mm, pgtable_t pgtable) +{ + struct page *page; + + page = virt_to_page(pgtable); + call_rcu(&page->rcu_head, pte_free_now); +}This can't be safe (on ppc). IIRC you might have up to 16x4k page tables sharing one 64kB page. So if you have two page tables from the same page being defer-freed simultaneously, you'll reuse the rcu_head and I cannot imagine things go well from that point.Oh yes, of course, thanks for catching that so quickly. So my s390 and sparc implementations will be equally broken.quoted
I have no idea how to solve this problem.I do: I'll have to go back to the more complicated implementation we actually ran with on powerpc - I was thinking those complications just related to deposit/withdraw matters, forgetting the one-rcu_head issue. It uses large (0x10000) increments of the page refcount, avoiding call_rcu() when already active. It's not a complication I had wanted to explain or test for now, but we shall have to. Should apply equally well to sparc, but s390 more of a problem, since s390 already has its own refcount cleverness.Yes, we have 2 pagetables in one 4K page, which could result in same rcu_head reuse. It might be possible to use the cleverness from our page_table_free() function, e.g. to only do the call_rcu() once, for the case where both 2K pagetable fragments become unused, similar to how we decide when to actually call __free_page().
Yes, I expect that it will be possible to mesh in with s390's cleverness there; but I may not be clever enough to do so myself - it was easier to get right by going my own way - except that the multiply-used rcu_head soon showed that I'd not got it right at all :-(
However, it might be much worse, and page->rcu_head from a pagetable page cannot be used at all for s390, because we also use page->lru to keep our list of free 2K pagetable fragments. I always get confused by struct page unions, so not completely sure, but it seems to me that page->rcu_head would overlay with page->lru, right?
However, I believe you are right that it's worse. I'm glad to hear that you get confused by the struct page unions, me too, I preferred the old pre-union days when we could see at a glance which fields overlaid. (Perhaps I'm nostalgically exaggerating that "see at a glance" ease.) But I think I remember the discussions when rcu_head, and compound_head at lru.next, came in: with the agreement that rcu_head.next would at least be 2-aligned to avoid PageTail - ah, it's even commented in the fundamental include/linux/types.h. Sigh. I don't at this moment know what to do for s390: it is frustrating to be held up by just the one architecture. But big thanks to you, Gerald, for bringing this to light. Hugh