Thread (42 messages) 42 messages, 6 authors, 2023-06-08

Re: [PATCH 05/12] powerpc: add pte_free_defer() for pgtables sharing page

From: Hugh Dickins <hughd@google.com>
Date: 2023-06-02 06:38:42
Also in: linux-s390, lkml, sparclinux

On Thu, 1 Jun 2023, Gerald Schaefer wrote:
On Mon, 29 May 2023 07:36:40 -0700 (PDT)
Hugh Dickins [off-list ref] wrote:
quoted
On Mon, 29 May 2023, Matthew Wilcox wrote:
quoted
On Sun, May 28, 2023 at 11:20:21PM -0700, Hugh Dickins wrote:  
quoted
+void pte_free_defer(struct mm_struct *mm, pgtable_t pgtable)
+{
+	struct page *page;
+
+	page = virt_to_page(pgtable);
+	call_rcu(&page->rcu_head, pte_free_now);
+}  
This can't be safe (on ppc).  IIRC you might have up to 16x4k page
tables sharing one 64kB page.  So if you have two page tables from the
same page being defer-freed simultaneously, you'll reuse the rcu_head
and I cannot imagine things go well from that point.  
Oh yes, of course, thanks for catching that so quickly.
So my s390 and sparc implementations will be equally broken.
quoted
I have no idea how to solve this problem.  
I do: I'll have to go back to the more complicated implementation we
actually ran with on powerpc - I was thinking those complications just
related to deposit/withdraw matters, forgetting the one-rcu_head issue.

It uses large (0x10000) increments of the page refcount, avoiding
call_rcu() when already active.

It's not a complication I had wanted to explain or test for now,
but we shall have to.  Should apply equally well to sparc, but s390
more of a problem, since s390 already has its own refcount cleverness.
Yes, we have 2 pagetables in one 4K page, which could result in same
rcu_head reuse. It might be possible to use the cleverness from our
page_table_free() function, e.g. to only do the call_rcu() once, for
the case where both 2K pagetable fragments become unused, similar to
how we decide when to actually call __free_page().
Yes, I expect that it will be possible to mesh in with s390's cleverness
there; but I may not be clever enough to do so myself - it was easier to
get right by going my own way - except that the multiply-used rcu_head
soon showed that I'd not got it right at all :-(
However, it might be much worse, and page->rcu_head from a pagetable
page cannot be used at all for s390, because we also use page->lru
to keep our list of free 2K pagetable fragments. I always get confused
by struct page unions, so not completely sure, but it seems to me that
page->rcu_head would overlay with page->lru, right?
However, I believe you are right that it's worse.  I'm glad to hear
that you get confused by the struct page unions, me too, I preferred the
old pre-union days when we could see at a glance which fields overlaid.
(Perhaps I'm nostalgically exaggerating that "see at a glance" ease.)

But I think I remember the discussions when rcu_head, and compound_head
at lru.next, came in: with the agreement that rcu_head.next would at
least be 2-aligned to avoid PageTail - ah, it's even commented in the
fundamental include/linux/types.h.

Sigh.  I don't at this moment know what to do for s390:
it is frustrating to be held up by just the one architecture.
But big thanks to you, Gerald, for bringing this to light.

Hugh
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help