Thread (140 messages) 140 messages, 21 authors, 2018-12-04

Re: [PATCH 10/17] prmem: documentation

From: Nadav Amit <hidden>
Date: 2018-10-30 23:18:47
Also in: linux-doc, linux-integrity, lkml

From: Andy Lutomirski
Sent: October 30, 2018 at 6:51:17 PM GMT
To: Matthew Wilcox <willy@infradead.org>, nadav.amit@gmail.com
Cc: Kees Cook <redacted>, Peter Zijlstra <peterz@infradead.org>, Igor Stoppa <redacted>, Mimi Zohar <redacted>, Dave Chinner <david@fromorbit.com>, James Morris <jmorris@namei.org>, Michal Hocko <mhocko@kernel.org>, Kernel Hardening <redacted>, linux-integrity <redacted>, linux-security-module <redacted>, Igor Stoppa <redacted>, Dave Hansen <dave.hansen@linux.intel.com>, Jonathan Corbet <corbet@lwn.net>, Laura Abbott <redacted>, Randy Dunlap <redacted>, Mike Rapoport <redacted>, open list:DOCUMENTATION <redacted>, LKML <redacted>, Thomas Gleixner <redacted>
Subject: Re: [PATCH 10/17] prmem: documentation



quoted
On Oct 30, 2018, at 10:58 AM, Matthew Wilcox [off-list ref] wrote:

On Tue, Oct 30, 2018 at 10:06:51AM -0700, Andy Lutomirski wrote:
quoted
quoted
On Oct 30, 2018, at 9:37 AM, Kees Cook [off-list ref] wrote:
I support the addition of a rare-write mechanism to the upstream kernel.
And I think that there is only one sane way to implement it: using an
mm_struct. That mm_struct, just like any sane mm_struct, should only
differ from init_mm in that it has extra mappings in the *user* region.
I'd like to understand this approach a little better.  In a syscall path,
we run with the user task's mm.  What you're proposing is that when we
want to modify rare data, we switch to rare_mm which contains a
writable mapping to all the kernel data which is rare-write.

So the API might look something like this:

  void *p = rare_alloc(...);    /* writable pointer */
  p->a = x;
  q = rare_protect(p);        /* read-only pointer */

To subsequently modify q,

  p = rare_modify(q);
  q->a = y;
  rare_protect(p);
How about:

rare_write(&q->a, y);

Or, for big writes:

rare_write_copy(&q, local_q);

This avoids a whole ton of issues. In practice, actually running with a
special mm requires preemption disabled as well as some other stuff, which
Nadav carefully dealt with.

Also, can we maybe focus on getting something merged for statically
allocated data first?

Finally, one issue: rare_alloc() is going to utterly suck performance-wise
due to the global IPI when the region gets zapped out of the direct map or
otherwise made RO. This is the same issue that makes all existing XPO
efforts so painful. We need to either optimize the crap out of it somehow
or we need to make sure it’s not called except during rare events like
device enumeration.

Nadav, want to resubmit your series? IIRC the only thing wrong with it was
that it was a big change and we wanted a simpler fix to backport. But
that’s all done now, and I, at least, rather liked your code. :)
I guess since it was based on your ideas…

Anyhow, the only open issue that I have with v2 is Peter’s wish that I would
make kgdb use of poke_text() less disgusting. I still don’t know exactly
how to deal with it.

Perhaps it (fixing kgdb) can be postponed? In that case I can just resend
v2.
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help