Thread (89 messages) 89 messages, 18 authors, 2017-05-13

[kernel-hardening] Re: [PATCH v9 1/4] syscalls: Verify address limit before returning to user-mode

From: Kees Cook <hidden>
Date: 2017-05-09 16:30:07
Also in: linux-api, linux-s390, lkml

On Mon, May 8, 2017 at 11:56 PM, Ingo Molnar [off-list ref] wrote:
* Kees Cook [off-list ref] wrote:
quoted
quoted
There's the option of using GCC plugins now that the infrastructure was
upstreamed from grsecurity. It can be used as part of the regular build
process and as long as the analysis is pretty simple it shouldn't hurt compile
time much.
Well, and that the situation may arise due to memory corruption, not from
poorly-matched set_fs() calls, which static analysis won't help solve. We need
to catch this bad kernel state because it is a very bad state to run in.
[attempting some thread-merging]
Ok, so that's CVE-2010-4258, where an oops with KERNEL_DS set was used to escalate
privileges, due to the kernel's oops handler not cleaning up the KERNEL_DS. The
exploit used another bug, a crash in a network protocol handler, to execute the
oops handler with KERNEL_DS set.
Right, I didn't mean to suggest that vulnerability would be fixed by
this solution. I was trying to show how there can be some pretty
complex interaction with exceptions/interrupts/etc that would make
pure static analysis still miss things.
If memory corruption corrupted the task state into having addr_limit set to
KERNEL_DS then there's already a fair chance that it's game over: it could also
have set *uid to 0, or changed a sensitive PF_ flag, or a number of other
things...

Furthermore, think about it: there's literally an infinite amount of corrupted
task states that could be a security problem and that could be checked after every
system call. Do we want to check every one of them?
Right, but this "slippery slope" argument isn't the best way to reject
security changes. Let me take a step back and describe the threat, and
where we should likely spend time:

The primary threat with addr_limit getting changed is that a
narrowly-scoped attack (traditionally stack exhaustion or
adjacent-stack large-index writes) could be leveraged into opening the
entire kernel to writes (by allowing all syscalls with a
copy_to_user() call to suddenly be able to write to kernel memory).
So, really, the flaw is having addr_limit at all. Removing set_fs()
should, I think, allow this to become a const (or at least should get
us a lot closer).

The main path to corrupting addr_limit has been via stack corruption.
On architectures with CONFIG_THREAD_INFO_IN_TASK, this risk is greatly
reduced already, but it's not universally available yet. (And as long
as we're talking about stack attacks, CONFIG_VMAP_STACK makes
cross-stack overflows go away, and cross-stack indexing harder, but
that's not really about addr_limit since currently nothing with
VMAP_STACK doesn't already have THREAD_INFO_IN_TASK.)

So, left with a still exploitable target in memory that allows such an
expansion of attack method, I still think it's worth keeping this
patch series, but if we can drop set_fs() I could probably be
convinced the benefit of the series doesn't exceed the cost on
THREAD_INFO_IN_TASK-architectures (x86, arm64, s390). But that means
at least currently keeping it on arm, for example. If we can make
addr_limit const, well, we don't need the series at all.

-Kees

-- 
Kees Cook
Pixel Security
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help