Thread (66 messages) 66 messages, 16 authors, 2018-08-08

Re: [PATCH v4 00/17] khwasan: kernel hardware assisted address sanitizer

From: Dmitry Vyukov <dvyukov@google.com>
Date: 2018-08-08 16:54:19
Also in: linux-arm-kernel, linux-kbuild, linux-mm, lkml

On Wed, Aug 8, 2018 at 6:27 PM, Will Deacon [off-list ref] wrote:
quoted
quoted
quoted
quoted
Thanks for tracking these cases down and going through each of them. The
obvious follow-up question is: how do we ensure that we keep on top of
this in mainline? Are you going to repeat your experiment at every kernel
release or every -rc or something else? I really can't see how we can
maintain this in the long run, especially given that the coverage we have
is only dynamic -- do you have an idea of how much coverage you're actually
getting for, say, a defconfig+modules build?

I'd really like to enable pointer tagging in the kernel, I'm just still
failing to see how we can do it in a controlled manner where we can reason
about the semantic changes using something other than a best-effort,
case-by-case basis which is likely to be fragile and error-prone.
Unfortunately, if that's all we have, then this gets relegated to a
debug feature, which sort of defeats the point in my opinion.
Well, in some cases there is no other way as resorting to dynamic testing.
How do we ensure that kernel does not dereference NULL pointers, does
not access objects after free or out of bounds? Nohow. And, yes, it's
constant maintenance burden resolved via dynamic testing.
... and the advantage of NULL pointer issues is that you're likely to see
them as a synchronous exception at runtime, regardless of architecture and
regardless of Kconfig options. With pointer tagging, that's certainly not
the case, and so I don't think we can just treat issues there like we do for
NULL pointers.
Well, let's take use-after-frees, out-of-bounds, info leaks, data
races is a good example, deadlocks and just logical bugs...
Ok, but it was you that brought up NULL pointers, so there's some goalpost
moving here.
I moved it only because our views on bugs seems to be somewhat
different. I would put it all including NULL derefs into the same
bucket of bugs. But the point I wanted to make holds if we take NULL
derefs out of equation too, so I took them out so that we don't
concentrate on "synchronous exceptions" only.
And as with NULL pointers, all of the issues you mention above
apply to other architectures and the majority of their configurations, so my
concerns about this feature remain.
quoted
quoted
If you want to enable khwasan in "production" and since enabling it
could potentially change the behaviour of existing code paths, the
run-time validation space doubles as we'd need to get the same code
coverage with and without the feature being enabled.
This is true for just any change in configs, sysctls or just a
different workload. Any of this can enable new code, exiting code
working differently, or just working with data in new states. And we
have tens of thousands of bugs, so blindly deploying anything new to
production without proper testing is a bad idea. It's not specific to
HWASAN in any way. And when you enable HWASAN you actually do mean to
retest everything as hard as possible.
I suppose I'm trying to understand whether we have to resort to testing, or
whether we can do better. I'm really uncomfortable with testing as our only
means of getting this right because this is a non-standard, arm64-specific
option and I don't think it will get very much testing in mainline at all.
Rather, we'll get spurious bug reports from forks of -stable many releases
later and we'll actually be worse-off for it.
quoted
And in the end we do not seem to have any action points here, right?
Right now, it feels like this series trades one set of bugs for another,
so I'd like to get to a position where this new set of bugs is genuinely
more manageable (i.e. detectable, fixable, preventable) than the old set.
Unfortunately, the only suggestion seems to be "testing", which I really
don't find convincing :(

Could we do things like:

  - Set up a dedicated arm64 test farm, running mainline and with a public
    frontend, aimed at getting maximum coverage of the kernel with KHWASAN
    enabled?
FWIW we could try to setup a syzbot instance with qemu/arm64
emulation. We run such combination few times, but I am not sure how
stable it will be wrt flaky timeouts/stalls/etc. If works, it will
give instant coverage of about 1MLOC.
  - Have an implementation of KHWASAN for other architectures? (Is this even
    possible?)

  - Have a compiler plugin to clear out the tag for pointer arithmetic?
    Could we WARN if two pointers are compared with different tags?
    Could we manipulate the tag on cast-to-pointer so that a mismatch would
    be qualifier to say that pointer was created via a cast?

  - ...

?

Will
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help