Re: [PATCH 1/2] vmalloc: New flag for flush before releasing pages
From: Andy Lutomirski <luto@kernel.org>
Date: 2018-12-06 20:26:37
Also in:
linux-mm, lkml
On Thu, Dec 6, 2018 at 12:20 PM Edgecombe, Rick P [off-list ref] wrote:
On Thu, 2018-12-06 at 11:19 -0800, Andy Lutomirski wrote:quoted
On Thu, Dec 6, 2018 at 11:01 AM Tycho Andersen [off-list ref] wrote:quoted
On Thu, Dec 06, 2018 at 10:53:50AM -0800, Andy Lutomirski wrote:quoted
quoted
If we are going to unmap the linear alias, why not do it at vmalloc() time rather than vfree() time?That’s not totally nuts. Do we ever have code that expects __va() to work on module data? Perhaps crypto code trying to encrypt static data because our APIs don’t understand virtual addresses. I guess if highmem is ever used for modules, then we should be fine. RO instead of not present might be safer. But I do like the idea of renaming Rick's flag to something like VM_XPFO or VM_NO_DIRECT_MAP and making it do all of this.Yeah, doing it for everything automatically seemed like it was/is going to be a lot of work to debug all the corner cases where things expect memory to be mapped but don't explicitly say it. And in particular, the XPFO series only does it for user memory, whereas an additional flag like this would work for extra paranoid allocations of kernel memory too.I just read the code, and I looks like vmalloc() is already using highmem (__GFP_HIGH) if available, so, on big x86_32 systems, for example, we already don't have modules in the direct map. So I say we go for it. This should be quite simple to implement -- the pageattr code already has almost all the needed logic on x86. The only arch support we should need is a pair of functions to remove a vmalloc address range from the address map (if it was present in the first place) and a function to put it back. On x86, this should only be a few lines of code. What do you all think? This should solve most of the problems we have. If we really wanted to optimize this, we'd make it so that module_alloc() allocates memory the normal way, then, later on, we call some function that, all at once, removes the memory from the direct map and applies the right permissions to the vmalloc alias (or just makes the vmalloc alias not-present so we can add permissions later without flushing), and flushes the TLB. And we arrange for vunmap to zap the vmalloc range, then put the memory back into the direct map, then free the pages back to the page allocator, with the flush in the appropriate place. I don't see why the page allocator needs to know about any of this. It's already okay with the permissions being changed out from under it on x86, and it seems fine. Rick, do you want to give some variant of this a try?Hi, Sorry, I've been having email troubles today. I found some cases where vmap with PAGE_KERNEL_RO happens, which would not set NP/RO in the directmap, so it would be sort of inconsistent whether the directmap of vmalloc range allocations were readable or not. I couldn't see any places where it would cause problems today though. I was ready to assume that all TLBs don't cache NP, because I don't know how usages where a page fault is used to load something could work without lots of flushes.
Or the architecture just fixes up the spurious faults, I suppose. I'm only well-educated on the x86 mmu.
If that's the case, then all archs with directmap permissions could share a single vmalloc special permission flush implementation that works like Andy described originally. It could be controlled with an ARCH_HAS_DIRECT_MAP_PERMS. We would just need something like set_pages_np and set_pages_rw on any archs with directmap permissions. So seems simpler to me (and what I have been doing) unless I'm missing the problem.
Hmm. The only reason I've proposed anything fancier was because I was thinking of minimizing flushes, but I think I'm being silly. This sequence ought to work optimally: - vmalloc(..., VM_HAS_DIRECT_MAP_PERMS); /* no flushes */ - Write some data, via vmalloc's return address. - Use some set_memory_whatever() functions to update permissions, which will flush, hopefully just once. - Run the module code! - vunmap -- this will do a single flush that will fix everything. This does require that set_pages_np() or set_memory_np() or whatever exists and that it's safe to do that, then flush, and then set_pages_rw(). So maybe you want set_pages_np_noflush() and set_pages_rw_noflush() to make it totally clear what's supposed to happen. --Andy