Re: [PATCH 10/17] prmem: documentation
From: Andy Lutomirski <luto@amacapital.net>
Date: 2018-11-13 18:36:09
Also in:
linux-doc, linux-integrity, lkml
On Tue, Nov 13, 2018 at 10:26 AM Igor Stoppa [off-list ref] wrote:
On 13/11/2018 19:16, Andy Lutomirski wrote:quoted
should be entirely abstracted away by an appropriate API, so neither SELinux nor IMA need to be aware that there's an mm_struct involved.Yes, that is fine. In my proposal I was thinking about tying it to the core/thread that performs the actual write. The high level API could be something like: wr_memcpy(void *src, void *dst, uint_t size)quoted
It's also entirely possible that some architectures won't even use an mm_struct behind the scenes -- x86, for example, could have avoided it if there were a kernel equivalent of PKRU. Sadly, there isn't.The mm_struct - or whatever is the means to do the write on that architecture - can be kept hidden from the API. But the reason why I was proposing to have one mm_struct per writer is that, iiuc, the secondary mapping is created in the secondary mm_struct for each writer using it. So the updating of IMA measurements would have, theoretically, also write access to the SELinux AVC. Which I was trying to avoid. And similarly any other write rare updater. Is this correct?
If you call a wr_memcpy() function with the signature you suggested, then you can overwrite any memory of this type. Having a different mm_struct under the hood makes no difference. As far as I'm concerned, for *dynamically allocated* rare-writable memory, you might as well allocate all the paging structures at the same time, so the mm_struct will always contain the mappings. If there are serious bugs in wr_memcpy() that cause it to write to the wrong place, we have bigger problems. I can imagine that we'd want a *typed* wr_memcpy()-like API some day, but that can wait for v2. And it still doesn't obviously need multiple mm_structs.
quoted
quoted
2) Iiuc, the purpose of the 2 pages being remapped is that the target of the patch might spill across the page boundary, however if I deal with the modification of generic data, I shouldn't (shouldn't I?) assume that the data will not span across multiple pages.The reason for the particular architecture of text_poke() is to avoid memory allocation to get it working. i think that prmem/rare_write should have each rare-writable kernel address map to a unique user address, possibly just by offsetting everything by a constant. For rare_write, you don't actually need it to work as such until fairly late in boot, since the rare_writable data will just be writable early on.Yes, that is true. I think it's safe to assume, from an attack pattern, that as long as user space is not started, the system can be considered ok. Even user-space code run from initrd should be ok, since it can be bundled (and signed) as a single binary with the kernel. Modules loaded from a regular filesystem are a bit more risky, because an attack might inject a rogue key in the key-ring and use it to load malicious modules.
If a malicious module is loaded, the game is over.
quoted
quoted
If the data spans across multiple pages, in unknown amount, I suppose that I should not keep interrupts disabled for an unknown time, as it would hurt preemption. What I thought, in my initial patch-set, was to iterate over each page that must be written to, in a loop, re-enabling interrupts in-between iterations, to give pending interrupts a chance to be served. This would mean that the data being written to would not be consistent, but it's a problem that would have to be addressed anyways, since it can be still read by other cores, while the write is ongoing.This probably makes sense, except that enabling and disabling interrupts means you also need to restore the original mm_struct (most likely), which is slow. I don't think there's a generic way to check whether in interrupt is pending without turning interrupts on.The only "excuse" I have is that write_rare is opt-in and is "rare". Maybe the enabling/disabling of interrupts - and the consequent switch of mm_struct - could be somehow tied to the latency configuration? If preemption is disabled, the expectations on the system latency are anyway more relaxed. But I'm not sure how it would work against I/O.
I think it's entirely reasonable for the API to internally break up very large memcpys.