Re: [PATCH 10/17] prmem: documentation
From: Andy Lutomirski <luto@amacapital.net>
Date: 2018-10-30 21:02:23
Also in:
linux-doc, linux-integrity, lkml
On Oct 30, 2018, at 1:43 PM, Igor Stoppa [off-list ref] wrote:quoted
On 30/10/2018 21:20, Matthew Wilcox wrote:quoted
On Tue, Oct 30, 2018 at 12:28:41PM -0600, Tycho Andersen wrote:quoted
On Tue, Oct 30, 2018 at 10:58:14AM -0700, Matthew Wilcox wrote: On Tue, Oct 30, 2018 at 10:06:51AM -0700, Andy Lutomirski wrote:quoted
quoted
On Oct 30, 2018, at 9:37 AM, Kees Cook [off-list ref] wrote:I support the addition of a rare-write mechanism to the upstream kernel. And I think that there is only one sane way to implement it: using an mm_struct. That mm_struct, just like any sane mm_struct, should only differ from init_mm in that it has extra mappings in the *user* region.I'd like to understand this approach a little better. In a syscall path, we run with the user task's mm. What you're proposing is that when we want to modify rare data, we switch to rare_mm which contains a writable mapping to all the kernel data which is rare-write. So the API might look something like this: void *p = rare_alloc(...); /* writable pointer */ p->a = x; q = rare_protect(p); /* read-only pointer */With pools and memory allocated from vmap_areas, I was able to say protect(pool) and that would do a swipe on all the pages currently in use. In the SELinux policyDB, for example, one doesn't really want to individually protect each allocation. The loading phase happens usually at boot, when the system can be assumed to be sane (one might even preload a bare-bone set of rules from initramfs and then replace it later on, with the full blown set). There is no need to process each of these tens of thousands allocations and initialization as write-rare. Would it be possible to do the same here?
I don’t see why not, although getting the API right will be a tad complicated.
quoted
quoted
quoted
To subsequently modify q, p = rare_modify(q); q->a = y;Do you mean p->a = y; here? I assume the intent is that q isn't writable ever, but that's the one we have in the structure at rest.Yes, that was my intent, thanks. To handle the list case that Igor has pointed out, you might want to do something like this: list_for_each_entry(x, &xs, entry) { struct foo *writable = rare_modify(entry);Would this mapping be impossible to spoof by other cores?
Indeed. Only the core with the special mm loaded could see it. But I dislike allowing regular writes in the protected region. We really only need four write primitives: 1. Just write one value. Call at any time (except NMI). 2. Just copy some bytes. Same as (1) but any number of bytes. 3,4: Same as 1 and 2 but must be called inside a special rare write region. This is purely an optimization. Actually getting a modifiable pointer should be disallowed for two reasons: 1. Some architectures may want to use a special write-different-address-space operation. Heck, x86 could, too: make the actual offset be a secret and shove the offset into FSBASE or similar. Then %fs-prefixed writes would do the rare writes. 2. Alternatively, x86 could set the U bit. Then the actual writes would use the uaccess helpers, giving extra protection via SMAP. We don’t really want a situation where an unchecked pointer in the rare write region completely defeats the mechanism.