Thread (20 messages) 20 messages, 4 authors, 2020-06-08

Re: [RFC PATCH 1/2] libnvdimm: Add prctl control for disabling synchronous fault support.

From: Michal Suchánek <hidden>
Date: 2020-06-01 12:09:13
Also in: nvdimm

On Mon, Jun 01, 2020 at 05:31:50PM +0530, Aneesh Kumar K.V wrote:
On 6/1/20 3:39 PM, Jan Kara wrote:
quoted
On Fri 29-05-20 16:25:35, Aneesh Kumar K.V wrote:
quoted
On 5/29/20 3:22 PM, Jan Kara wrote:
quoted
On Fri 29-05-20 15:07:31, Aneesh Kumar K.V wrote:
quoted
Thanks Michal. I also missed Jeff in this email thread.
And I think you'll also need some of the sched maintainers for the prctl
bits...
quoted
On 5/29/20 3:03 PM, Michal Suchánek wrote:
quoted
Adding Jan

On Fri, May 29, 2020 at 11:11:39AM +0530, Aneesh Kumar K.V wrote:
quoted
With POWER10, architecture is adding new pmem flush and sync instructions.
The kernel should prevent the usage of MAP_SYNC if applications are not using
the new instructions on newer hardware.

This patch adds a prctl option MAP_SYNC_ENABLE that can be used to enable
the usage of MAP_SYNC. The kernel config option is added to allow the user
to control whether MAP_SYNC should be enabled by default or not.

Signed-off-by: Aneesh Kumar K.V <redacted>
...
quoted
quoted
quoted
diff --git a/kernel/fork.c b/kernel/fork.c
index 8c700f881d92..d5a9a363e81e 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -963,6 +963,12 @@ __cacheline_aligned_in_smp DEFINE_SPINLOCK(mmlist_lock);
    static unsigned long default_dump_filter = MMF_DUMP_FILTER_DEFAULT;
+#ifdef CONFIG_ARCH_MAP_SYNC_DISABLE
+unsigned long default_map_sync_mask = MMF_DISABLE_MAP_SYNC_MASK;
+#else
+unsigned long default_map_sync_mask = 0;
+#endif
+
I'm not sure CONFIG is really the right approach here. For a distro that would
basically mean to disable MAP_SYNC for all PPC kernels unless application
explicitly uses the right prctl. Shouldn't we rather initialize
default_map_sync_mask on boot based on whether the CPU we run on requires
new flush instructions or not? Otherwise the patch looks sensible.
yes that is correct. We ideally want to deny MAP_SYNC only w.r.t POWER10.
But on a virtualized platform there is no easy way to detect that. We could
ideally hook this into the nvdimm driver where we look at the new compat
string ibm,persistent-memory-v2 and then disable MAP_SYNC
if we find a device with the specific value.
Hum, couldn't we set some flag for nvdimm devices with
"ibm,persistent-memory-v2" property and then check it during mmap(2) time
and when the device has this propery and the mmap(2) caller doesn't have
the prctl set, we'd disallow MAP_SYNC? That should make things mostly
seamless, shouldn't it? Only apps that want to use MAP_SYNC on these
devices would need to use prctl(MMF_DISABLE_MAP_SYNC, 0) but then these
applications need to be aware of new instructions so this isn't that much
additional burden...
I am not sure application would want to add that much details/knowledge
about a platform in their code. I was expecting application to do

#ifdef __ppc64__
        prctl(MAP_SYNC_ENABLE, 1, 0, 0, 0));
#endif
        a = mmap(NULL, PAGE_SIZE, PROT_READ|PROT_WRITE,
                        MAP_SHARED_VALIDATE | MAP_SYNC, fd, 0);


For that code all the complexity that we add w.r.t ibm,persistent-memory-v2
is not useful. Do you see a value in making all these device specific rather
than a conditional on  __ppc64__?
If the vpmem devices continue to work with the old instruction on
POWER10 then it makes sense to make this per-device.

Also adding a message to kernel log in case the application does not do
the prctl would be helful for people migrating old code to POWER10.

Thanks

Michal
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help