Re: [PATCH v2 07/12] s390: add pte_free_defer() for pgtables sharing page

[PATCH v2 00/12] mm: free retracted page table by RCU · Hugh Dickins <hughd@google.com> · 2023-06-20
[PATCH v2 01/12] mm/pgtable: add rcu_read_lock() and rcu_read_unlock()s · Hugh Dickins <hughd@google.com> · 2023-06-20
[PATCH v2 02/12] mm/pgtable: add PAE safety to __pte_offset_map() · Hugh Dickins <hughd@google.com> · 2023-06-20
[PATCH v2 03/12] arm: adjust_pte() use pte_offset_map_nolock() · Hugh Dickins <hughd@google.com> · 2023-06-20
[PATCH v2 04/12] powerpc: assert_pte_locked() use pte_offset_map_nolock() · Hugh Dickins <hughd@google.com> · 2023-06-20
[PATCH v2 05/12] powerpc: add pte_free_defer() for pgtables sharing page · Hugh Dickins <hughd@google.com> · 2023-06-20
Re: [PATCH v2 05/12] powerpc: add pte_free_defer() for pgtables sharing page · Jason Gunthorpe <jgg@ziepe.ca> · 2023-06-20
Re: [PATCH v2 05/12] powerpc: add pte_free_defer() for pgtables sharing page · Hugh Dickins <hughd@google.com> · 2023-06-20
Re: [PATCH v2 05/12] powerpc: add pte_free_defer() for pgtables sharing page · Jason Gunthorpe <jgg@ziepe.ca> · 2023-06-20
Re: [PATCH v2 05/12] powerpc: add pte_free_defer() for pgtables sharing page · Hugh Dickins <hughd@google.com> · 2023-06-22
Re: [PATCH v2 05/12] powerpc: add pte_free_defer() for pgtables sharing page · Jason Gunthorpe <jgg@ziepe.ca> · 2023-06-27
Re: [PATCH v2 05/12] powerpc: add pte_free_defer() for pgtables sharing page · Hugh Dickins <hughd@google.com> · 2023-06-27
[PATCH v2 06/12] sparc: add pte_free_defer() for pte_t *pgtable_t · Hugh Dickins <hughd@google.com> · 2023-06-20
[PATCH v2 07/12] s390: add pte_free_defer() for pgtables sharing page · Hugh Dickins <hughd@google.com> · 2023-06-20
Re: [PATCH v2 07/12] s390: add pte_free_defer() for pgtables sharing page · Gerald Schaefer <gerald.schaefer@linux.ibm.com> · 2023-06-28
Re: [PATCH v2 07/12] s390: add pte_free_defer() for pgtables sharing page · Hugh Dickins <hughd@google.com> · 2023-06-29
Re: [PATCH v2 07/12] s390: add pte_free_defer() for pgtables sharing page · Jason Gunthorpe <jgg@ziepe.ca> · 2023-06-29
Re: [PATCH v2 07/12] s390: add pte_free_defer() for pgtables sharing page · Gerald Schaefer <gerald.schaefer@linux.ibm.com> · 2023-06-29
Re: [PATCH v2 07/12] s390: add pte_free_defer() for pgtables sharing page · Hugh Dickins <hughd@google.com> · 2023-06-30
Re: [PATCH v2 07/12] s390: add pte_free_defer() for pgtables sharing page · Hugh Dickins <hughd@google.com> · 2023-07-02
Re: [PATCH v2 07/12] s390: add pte_free_defer() for pgtables sharing page · Alexander Gordeev <agordeev@linux.ibm.com> · 2023-07-04
Re: [PATCH v2 07/12] s390: add pte_free_defer() for pgtables sharing page · Hugh Dickins <hughd@google.com> · 2023-07-04
Re: [PATCH v2 07/12] s390: add pte_free_defer() for pgtables sharing page · Gerald Schaefer <gerald.schaefer@linux.ibm.com> · 2023-07-04
Re: [PATCH v2 07/12] s390: add pte_free_defer() for pgtables sharing page · Hugh Dickins <hughd@google.com> · 2023-07-04
Re: [PATCH v2 07/12] s390: add pte_free_defer() for pgtables sharing page · Gerald Schaefer <gerald.schaefer@linux.ibm.com> · 2023-07-05
Re: [PATCH v2 07/12] s390: add pte_free_defer() for pgtables sharing page · Hugh Dickins <hughd@google.com> · 2023-07-06
Re: [PATCH v2 07/12] s390: add pte_free_defer() for pgtables sharing page · Gerald Schaefer <gerald.schaefer@linux.ibm.com> · 2023-07-06
Re: [PATCH v2 07/12] s390: add pte_free_defer() for pgtables sharing page · Hugh Dickins <hughd@google.com> · 2023-07-06
Re: [PATCH v2 07/12] s390: add pte_free_defer() for pgtables sharing page · Jason Gunthorpe <jgg@ziepe.ca> · 2023-07-10
Re: [PATCH v2 07/12] s390: add pte_free_defer() for pgtables sharing page · Alexander Gordeev <agordeev@linux.ibm.com> · 2023-07-05
Re: [PATCH v2 07/12] s390: add pte_free_defer() for pgtables sharing page · Hugh Dickins <hughd@google.com> · 2023-07-06
Re: [PATCH v2 07/12] s390: add pte_free_defer() for pgtables sharing page · Gerald Schaefer <gerald.schaefer@linux.ibm.com> · 2023-07-07
Re: [PATCH v2 07/12] s390: add pte_free_defer() for pgtables sharing page · Gerald Schaefer <gerald.schaefer@linux.ibm.com> · 2023-07-03
Re: [PATCH v2 07/12] s390: add pte_free_defer() for pgtables sharing page · Alexander Gordeev <agordeev@linux.ibm.com> · 2023-06-29
Re: [PATCH v2 07/12] s390: add pte_free_defer() for pgtables sharing page · Gerald Schaefer <gerald.schaefer@linux.ibm.com> · 2023-06-29
Re: [PATCH v2 07/12] s390: add pte_free_defer() for pgtables sharing page · Claudio Imbrenda <imbrenda@linux.ibm.com> · 2023-06-30
Re: [PATCH v2 07/12] s390: add pte_free_defer() for pgtables sharing page · Hugh Dickins <hughd@google.com> · 2023-06-30
Re: [PATCH v2 07/12] s390: add pte_free_defer() for pgtables sharing page · Claudio Imbrenda <imbrenda@linux.ibm.com> · 2023-06-30
Re: [PATCH v2 07/12] s390: add pte_free_defer() for pgtables sharing page · Hugh Dickins <hughd@google.com> · 2023-06-30
Re: [PATCH v2 07/12] s390: add pte_free_defer() for pgtables sharing page · Claudio Imbrenda <imbrenda@linux.ibm.com> · 2023-07-03
Re: [PATCH v2 07/12] s390: add pte_free_defer() for pgtables sharing page · Jason Gunthorpe <jgg@ziepe.ca> · 2023-07-03
[PATCH v2 08/12] mm/pgtable: add pte_free_defer() for pgtable as page · Hugh Dickins <hughd@google.com> · 2023-06-20
[PATCH v2 09/12] mm/khugepaged: retract_page_tables() without mmap or vma lock · Hugh Dickins <hughd@google.com> · 2023-06-20
[PATCH v2 10/12] mm/khugepaged: collapse_pte_mapped_thp() with mmap_read_lock() · Hugh Dickins <hughd@google.com> · 2023-06-20
[PATCH mm 10/12] mm/khugepaged: collapse_pte_mapped_thp() with mmap_read_lock() · Hugh Dickins <hughd@google.com> · 2023-06-20
[PATCH v2 11/12] mm/khugepaged: delete khugepaged_collapse_pte_mapped_thps() · Hugh Dickins <hughd@google.com> · 2023-06-20
[PATCH v2 12/12] mm: delete mmap_write_trylock() and vma_try_start_write() · Hugh Dickins <hughd@google.com> · 2023-06-20

From: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Date: 2023-06-29 15:46:10
Also in: linux-mm, linux-s390, lkml, sparclinux

On Thu, 29 Jun 2023 15:59:07 +0200
Alexander Gordeev [off-list ref] wrote:

On Wed, Jun 28, 2023 at 09:16:24PM +0200, Gerald Schaefer wrote:

quoted

On Tue, 20 Jun 2023 00:51:19 -0700 (PDT)
Hugh Dickins [off-list ref] wrote:

Hi Gerald, Hugh!

...

quoted

@@ -407,6 +445,88 @@ void __tlb_remove_table(void *_table)
 	__free_page(page);
 }
 
+#ifdef CONFIG_TRANSPARENT_HUGEPAGE
+static void pte_free_now0(struct rcu_head *head);
+static void pte_free_now1(struct rcu_head *head);  

What about pte_free_lower() / pte_free_upper()?

I actually like the 0/1 better, I always get confused what exactly we
mean with "lower / upper" in our code and comments. Is it the first
or second half? With 0/1 it is immediately clear to me.

...

quoted

+void pte_free_defer(struct mm_struct *mm, pgtable_t pgtable)
+{
+	unsigned int bit, mask;
+	struct page *page;
+
+	page = virt_to_page(pgtable);
+	if (mm_alloc_pgste(mm)) {
+		/*
+		 * TODO: Do we need gmap_unlink(mm, pgtable, addr), like in
+		 * page_table_free_rcu()?
+		 * If yes -> need addr parameter here, like in pte_free_tlb().
+		 */
+		call_rcu(&page->rcu_head, pte_free_pgste);
+		return;
+}
+	bit = ((unsigned long)pgtable & ~PAGE_MASK) / (PTRS_PER_PTE * sizeof(pte_t));
+
+	spin_lock_bh(&mm->context.lock);
+	mask = atomic_xor_bits(&page->_refcount, 0x15U << (bit + 24));

This  makes the bit logic increasingly complicated to me.

I think it is well in line with existing code in page_table_free[_rcu].
Only instead of doing xor with 0x11U, it does xor with 0x15U to also
switch on the H bit while at it.

What if instead we set the rule "one bit at a time only"?
That means an atomic group bit flip is only allowed between
pairs of bits, namely:

bit flip	initiated from
-----------	----------------------------------------
P      <- A	page_table_free(), page_table_free_rcu()
     H <- A	pte_free_defer()
P <- H		pte_free_half()

In the current model P bit could be on together with H
bit simultaneously. That actually brings in equation
nothing.

P bit has to be set at the latest when __tlb_remove_table() gets
called, because there it is checked / cleared. It might be possible
to not set it in pte_free_defer() already, but only later in
pte_free_half() RCU callback, before calling __tlb_remove_table().
But that would not be in line any more with existing code, where it
is already set before scheduling the RCU callback.

Therefore, I would rather stick to the current approach, unless
you see some bug in it.

Besides, this check in page_table_alloc() (while still
correct) makes one (well, me) wonder "what about HH bits?":

	mask = (mask | (mask >> 4)) & 0x03U;
	if (mask != 0x03U) {
		...
	}

Without adding fragments back to the list, it is not necessary
to check any H bits page_table_alloc(), or so I hope. Actually,
I like that aspect most, i.e. we have as little impact on current
code as possible.

And H bits are only relevant for preventing double use of rcu_head,
which is what they were designed for, and only the new code has
to care about them.

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help