Thread (33 messages) 33 messages, 6 authors, 2021-10-26

Re: [PATCH v2 2/5] mm: avoid unnecessary flush on change_huge_pmd()

From: Nadav Amit <hidden>
Date: 2021-10-26 19:06:46
Also in: lkml

On Oct 26, 2021, at 11:44 AM, Dave Hansen [off-list ref] wrote:

On 10/26/21 10:44 AM, Nadav Amit wrote:
quoted
quoted
"If software on one logical processor writes to a page while software on
another logical processor concurrently clears the R/W flag in the
paging-structure entry that maps the page, execution on some processors may
result in the entry’s dirty flag being set (due to the write on the first
logical processor) and the entry’s R/W flag being clear (due to the update
to the entry on the second logical processor). This will never occur on a
processor that supports control-flow enforcement technology (CET)”

So I guess that this optimization can only be enabled when CET is enabled.

:(
I still wonder whether the SDM comment applies to present bit vs dirty
bit atomicity as well.
I think it's implicit.  From "4.8 ACCESSED AND DIRTY FLAGS":

	"Whenever there is a write to a linear address, the processor
	 sets the dirty flag (if it is not already set) in the paging-
	 structure entry"

There can't be a "write to a linear address" without a Present=1 PTE.
If it were a Dirty=1,Present=1 PTE, there's no race because there might
not be a write to the PTE at all.

There's also this from the "4.10.4.3 Optional Invalidation" section:

	"no TLB entry or paging-structure cache entry is created with
	 information from a paging-structure entry in which the P flag
	 is 0."

That means that we don't have to worry about the TLB doing something
bonkers like caching a Dirty=1 bit from a Present=0 PTE.

Is that what you were worried about?
Thanks Dave, but no - that is not my concern.

To make it very clear - consider the following scenario, in which
a volatile pointer p is mapped using a certain PTE, which is RW
(i.e., *p is writable):

  CPU0				CPU1
  ----				----
  x = *p
  [ PTE cached in TLB; 
    PTE is not dirty ]
				clear_pte(PTE)
  *p = x
  [ needs to set dirty ]

Note that there is no TLB flush in this scenario. The question
is whether the write access to *p would succeed, setting the
dirty bit on the clear, non-present entry.

I was under the impression that the hardware AD-assist would
recheck the PTE atomically as it sets the dirty bit. But, as I
said, I am not sure anymore whether this is defined architecturally
(or at least would work in practice on all CPUs modulo the 
Knights Landing thingy).

Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help