Thread (47 messages) 47 messages, 5 authors, 2023-07-10

Re: [PATCH v2 07/12] s390: add pte_free_defer() for pgtables sharing page

From: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Date: 2023-06-29 15:46:10
Also in: linux-mm, linux-s390, lkml, sparclinux

On Thu, 29 Jun 2023 15:59:07 +0200
Alexander Gordeev [off-list ref] wrote:
On Wed, Jun 28, 2023 at 09:16:24PM +0200, Gerald Schaefer wrote:
quoted
On Tue, 20 Jun 2023 00:51:19 -0700 (PDT)
Hugh Dickins [off-list ref] wrote:  
Hi Gerald, Hugh!

...
quoted
@@ -407,6 +445,88 @@ void __tlb_remove_table(void *_table)
 	__free_page(page);
 }
 
+#ifdef CONFIG_TRANSPARENT_HUGEPAGE
+static void pte_free_now0(struct rcu_head *head);
+static void pte_free_now1(struct rcu_head *head);  
What about pte_free_lower() / pte_free_upper()?
I actually like the 0/1 better, I always get confused what exactly we
mean with "lower / upper" in our code and comments. Is it the first
or second half? With 0/1 it is immediately clear to me.
...
quoted
+void pte_free_defer(struct mm_struct *mm, pgtable_t pgtable)
+{
+	unsigned int bit, mask;
+	struct page *page;
+
+	page = virt_to_page(pgtable);
+	if (mm_alloc_pgste(mm)) {
+		/*
+		 * TODO: Do we need gmap_unlink(mm, pgtable, addr), like in
+		 * page_table_free_rcu()?
+		 * If yes -> need addr parameter here, like in pte_free_tlb().
+		 */
+		call_rcu(&page->rcu_head, pte_free_pgste);
+		return;
+}
+	bit = ((unsigned long)pgtable & ~PAGE_MASK) / (PTRS_PER_PTE * sizeof(pte_t));
+
+	spin_lock_bh(&mm->context.lock);
+	mask = atomic_xor_bits(&page->_refcount, 0x15U << (bit + 24));  
This  makes the bit logic increasingly complicated to me.
I think it is well in line with existing code in page_table_free[_rcu].
Only instead of doing xor with 0x11U, it does xor with 0x15U to also
switch on the H bit while at it.
What if instead we set the rule "one bit at a time only"?
That means an atomic group bit flip is only allowed between
pairs of bits, namely:

bit flip	initiated from
-----------	----------------------------------------
P      <- A	page_table_free(), page_table_free_rcu()
     H <- A	pte_free_defer()
P <- H		pte_free_half()

In the current model P bit could be on together with H
bit simultaneously. That actually brings in equation
nothing.
P bit has to be set at the latest when __tlb_remove_table() gets
called, because there it is checked / cleared. It might be possible
to not set it in pte_free_defer() already, but only later in
pte_free_half() RCU callback, before calling __tlb_remove_table().
But that would not be in line any more with existing code, where it
is already set before scheduling the RCU callback.

Therefore, I would rather stick to the current approach, unless
you see some bug in it.
Besides, this check in page_table_alloc() (while still
correct) makes one (well, me) wonder "what about HH bits?":

	mask = (mask | (mask >> 4)) & 0x03U;
	if (mask != 0x03U) {
		...
	}
Without adding fragments back to the list, it is not necessary
to check any H bits page_table_alloc(), or so I hope. Actually,
I like that aspect most, i.e. we have as little impact on current
code as possible.

And H bits are only relevant for preventing double use of rcu_head,
which is what they were designed for, and only the new code has
to care about them.
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help