Thread (47 messages) 47 messages, 5 authors, 2023-07-10

Re: [PATCH v2 07/12] s390: add pte_free_defer() for pgtables sharing page

From: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Date: 2023-07-03 16:12:15
Also in: linux-mm, linux-s390, lkml, sparclinux

On Thu, 29 Jun 2023 23:00:07 -0700 (PDT)
Hugh Dickins [off-list ref] wrote:
On Thu, 29 Jun 2023, Gerald Schaefer wrote:
quoted
On Thu, 29 Jun 2023 12:22:24 -0300
Jason Gunthorpe [off-list ref] wrote:  
quoted
On Wed, Jun 28, 2023 at 10:08:08PM -0700, Hugh Dickins wrote:  
quoted
On Wed, 28 Jun 2023, Gerald Schaefer wrote:    
quoted
As discussed in the other thread, we would rather go with less complexity,
possibly switching to an approach w/o the list and fragment re-use in the
future. For now, as a first step in that direction, we can try with not
adding fragments back only for pte_free_defer(). Here is an adjusted
version of your patch, copying most of your pte_free_defer() logic and
also description, tested with LTP and all three of your patch series applied:    
Thanks, Gerald: I don't mind abandoning my 13/12 SLAB_TYPESAFE_BY_RCU
patch (posted with fewer Cc's to the s390 list last week), and switching
to your simpler who-cares-if-we-sometimes-don't-make-maximal-use-of-page
patch.

But I didn't get deep enough into it today to confirm it - and disappointed
that you've found it necessary to play with pt_frag_refcount in addition to
_refcount and HH bits.  No real problem with that, but my instinct says it
should be simpler.    
Yes, I also found it a bit awkward, but it seemed "good and simple enough",
to have something to go forward with, while my instinct was in line with yours.
  
quoted
Is there any reason it should be any different at all from what PPC is
doing?

I still think the right thing to do here is make the PPC code common
(with Hugh's proposed RCU modification) and just use it in both
arches....  
With the current approach, we would not add back fragments _only_ for
the new pte_free_defer() path, while keeping our cleverness for the other
paths. Not having a good overview of the negative impact wrt potential
memory waste, I would rather take small steps, if possible.

If we later switch to never adding back fragments, of course we should
try to be in line with PPC implementation.  
I find myself half-agreeing with everyone.

I agree with Gerald that s390 should keep close to what it is already
doing (except for adding pte_free_defer()): that changing its strategy
and implementation to be much more like powerpc, is a job for some other
occasion (and would depend on gathering data about how well each does).

But I agree with Jason that the powerpc solution we ended up with cut
out a lot of unnecessary complication: it shifts the RCU delay from
when pte_free_defer() is called, to when the shared page comes to be
freed; which may be a lot later, and might not be welcome in a common
path, but is quite okay for the uncommon pte_free_defer().
Ok, I guess I must admit that I completely ignored the latest progress in
the powerpc thread, and therefore was not up-to-date. Still had the older
approach in mind, where you also checked for pt_frag_refcount to avoid
double call_rcu().

The new approach sounds very reasonable, and I also like your latest
s390 patch from a first glance. Need to get more up-to-date with PageActive
and maybe also powerpc approach, and give this some proper review tomorrow.
And I agree with Alexander that pte_free_lower() and pte_free_upper()
are better names than pte_free_now0() and pte_free_now1(): I was going
to make that change, except all those functions disappear if we follow
Jason's advice and switch the call_rcu() to when freeing the page.

(Lower and upper seem unambiguous to me: Gerald, does your confusion
come just from the way they are shown the wrong way round in the PP AA
diagram?  I corrected that in my patch, but you reverted it in yours.)
Ah yes, that could well be, and unfortunately I did not notice that you
fixed that in the comment. I only saw that you "fixed" the bit numbering
from 01234567 to 76543210, which I think is wrong on big-endian s390,
and therefore I simply removed that complete hunk.

But thanks a lot for pointing to that! We will certainly want to fix that
comment in a later patch, to reduce some or maybe all of the (at least
my) upper/lower confusion.
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help