Thread (54 messages) 54 messages, 9 authors, 2018-12-07

Re: [PATCH 1/2] vmalloc: New flag for flush before releasing pages

From: Will Deacon <hidden>
Date: 2018-12-06 11:10:41
Also in: linux-mm, lkml

On Thu, Dec 06, 2018 at 08:29:03AM +0100, Ard Biesheuvel wrote:
On Thu, 6 Dec 2018 at 00:16, Andy Lutomirski [off-list ref] wrote:
quoted
On Wed, Dec 5, 2018 at 3:41 AM Will Deacon [off-list ref] wrote:
quoted
On Tue, Dec 04, 2018 at 12:09:49PM -0800, Andy Lutomirski wrote:
quoted
On Tue, Dec 4, 2018 at 12:02 PM Edgecombe, Rick P
[off-list ref] wrote:
quoted
On Tue, 2018-12-04 at 16:03 +0000, Will Deacon wrote:
quoted
On Mon, Dec 03, 2018 at 05:43:11PM -0800, Nadav Amit wrote:
quoted
quoted
On Nov 27, 2018, at 4:07 PM, Rick Edgecombe [off-list ref]
wrote:

Since vfree will lazily flush the TLB, but not lazily free the underlying
pages,
it often leaves stale TLB entries to freed pages that could get re-used.
This is
undesirable for cases where the memory being freed has special permissions
such
as executable.
So I am trying to finish my patch-set for preventing transient W+X mappings
from taking space, by handling kprobes & ftrace that I missed (thanks again
for
pointing it out).

But all of the sudden, I don’t understand why we have the problem that this
(your) patch-set deals with at all. We already change the mappings to make
the memory wrAcked-by: Ard Biesheuvel [off-list ref]
itable before freeing the memory, so why can’t we make it
quoted
quoted
quoted
quoted
quoted
quoted
non-executable at the same time? Actually, why do we make the module memory,
including its data executable before freeing it???
Yeah, this is really confusing, but I have a suspicion it's a combination
of the various different configurations and hysterical raisins. We can't
rely on module_alloc() allocating from the vmalloc area (see nios2) nor
can we rely on disable_ro_nx() being available at build time.

If we *could* rely on module allocations always using vmalloc(), then
we could pass in Rick's new flag and drop disable_ro_nx() altogether
afaict -- who cares about the memory attributes of a mapping that's about
to disappear anyway?

Is it just nios2 that does something different?
Yea it is really intertwined. I think for x86, set_memory_nx everywhere would
solve it as well, in fact that was what I first thought the solution should be
until this was suggested. It's interesting that from the other thread Masami
Hiramatsu referenced, set_memory_nx was suggested last year and would have
inadvertently blocked this on x86. But, on the other architectures I have since
learned it is a bit different.

It looks like actually most arch's don't re-define set_memory_*, and so all of
the frob_* functions are actually just noops. In which case allocating RWX is
needed to make it work at all, because that is what the allocation is going to
stay at. So in these archs, set_memory_nx won't solve it because it will do
nothing.

On x86 I think you cannot get rid of disable_ro_nx fully because there is the
changing of the permissions on the directmap as well. You don't want some other
caller getting a page that was left RO when freed and then trying to write to
it, if I understand this.
Exactly.
Of course, I forgot about the linear mapping. On arm64, we've just queued
support for reflecting changes to read-only permissions in the linear map
[1]. So, whilst the linear map is always non-executable, we will need to
make parts of it writable again when freeing the module.
quoted
After slightly more thought, I suggest renaming VM_IMMEDIATE_UNMAP to
VM_MAY_ADJUST_PERMS or similar.  It would have the semantics you want,
but it would also call some arch hooks to put back the direct map
permissions before the flush.  Does that seem reasonable?  It would
need to be hooked up that implement set_memory_ro(), but that should
be quite easy.  If nothing else, it could fall back to set_memory_ro()
in the absence of a better implementation.
You mean set_memory_rw() here, right? Although, eliding the TLB invalidation
would open up a window where the vmap mapping is executable and the linear
mapping is writable, which is a bit rubbish.
Right, and Rick pointed out the same issue.  Instead, we should set
the direct map not-present or its ARM equivalent, then do the flush,
then make it RW.  I assume this also works on arm and arm64, although
I don't know for sure.  On x86, the CPU won't cache not-present PTEs.
If we are going to unmap the linear alias, why not do it at vmalloc()
time rather than vfree() time?
Right, that should be pretty straightforward. We're basically saying that
RO in the vmalloc area implies PROT_NONE in the linear map, so we could
just do this in our set_memory_ro() function.

Will
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help