Thread (29 messages) 29 messages, 5 authors, 2019-05-03

Re: [PATCH 0/9] 52-bit kernel + user VAs

From: Steve Capper <hidden>
Date: 2019-02-28 11:45:38

On Thu, Feb 28, 2019 at 12:22:09PM +0100, Ard Biesheuvel wrote:
On Thu, 28 Feb 2019 at 11:36, Steve Capper [off-list ref] wrote:
quoted
On Tue, Feb 26, 2019 at 09:17:49PM +0100, Ard Biesheuvel wrote:
quoted
On Tue, 26 Feb 2019 at 18:30, Steve Capper [off-list ref] wrote:
quoted
On Tue, Feb 19, 2019 at 05:18:18PM +0100, Ard Biesheuvel wrote:
quoted
On Tue, 19 Feb 2019 at 14:56, Steve Capper [off-list ref] wrote:
quoted
On Tue, Feb 19, 2019 at 02:15:26PM +0100, Ard Biesheuvel wrote:
quoted
On Tue, 19 Feb 2019 at 14:01, Will Deacon [off-list ref] wrote:
quoted
On Tue, Feb 19, 2019 at 01:51:51PM +0100, Ard Biesheuvel wrote:
quoted
On Tue, 19 Feb 2019 at 13:48, Will Deacon [off-list ref] wrote:
quoted
On Tue, Feb 19, 2019 at 01:13:32PM +0100, Ard Biesheuvel wrote:
quoted
On Mon, 18 Feb 2019 at 18:05, Steve Capper [off-list ref] wrote:
quoted
This patch series adds support for 52-bit kernel VAs using some of the
machinery already introduced by the 52-bit userspace VA code in 5.0.

As 52-bit virtual address support is an optional hardware feature,
software support for 52-bit kernel VAs needs to be deduced at early boot
time. If HW support is not available, the kernel falls back to 48-bit.

A significant proportion of this series focuses on "de-constifying"
VA_BITS related constants.

In order to allow for a KASAN shadow that changes size at boot time, one
must fix the KASAN_SHADOW_END for both 48 & 52-bit VAs and "grow" the
start address. Also, it is highly desirable to maintain the same
function addresses in the kernel .text between VA sizes. Both of these
requirements necessitate us to flip the kernel address space halves s.t.
the direct linear map occupies the lower addresses.

One obvious omission is 52-bit kernel VA + 48-bit userspace VA which I
can add with some more #ifdef'ery if needed.
Hi Steve,

Apologies if I am bringing up things that have been addressed
internally already. We discussed the 52-bit kernel VA work at
plumber's at some point, and IIUC, KASAN is the complicating factor
when it comes to having compile time constants for VA_BITS_MIN,
VA_BITS_MAX and PAGE_OFFSET, right?

To clarify what I mean, please refer to the diagram below, which
describes a hybrid 48/52 kernel VA arrangement that does not rely on
runtime variable quantities. (VA_BITS_MIN == 48, VA_BITS_MAX == 52)

+------------------- (~0) -------------------------+
|                                                |
|            PCI IO / fixmap spaces              |
|                                                |
+------------------------------------------------+
|                                                |
|             kernel/vmalloc space               |
|                                                |
+------------------------------------------------+
|                                                |
|                module space                    |
|                                                |
+------------------------------------------------+
|                                                |
|                BPF space                       |
|                                                |
+------------------------------------------------+
|                                                |
|                                                |
|   vmemmap space (size based on VA_BITS_MAX)    |
|                                                |
|                                                |
+-- linear/vmalloc split based on VA_BITS_MIN -- +
|                                                |
|    linear mapping (48 bit addressable region)  |
|                                                |
+------------------------------------------------+
|                                                |
|    linear mapping (52 bit addressable region)  |
|                                                |
+------ PAGE_OFFSET based on VA_BITS_MAX --------+

Since KASAN is what is preventing this, would it be acceptable for
KASAN to only be supported when you use a true 48 bit or a true 52 bit
configuration, and disable it for the 48/52 hybrid configuration?

Just thinking out loud (and in ASCII art :-))
TBH, if we end up having support for 52-bit kernel VA, I'd be inclined to
drop the 48/52 configuration altogether. But Catalin's on holiday at the
moment, and may have a different opinion ;)
But that implies that you cannot have an image that supports 52-bit
kernel VAs but can still boot on hardware that does not implement
support for it. If that is acceptable, then none of this hoop jumping
that Steve is doing in these patches is necessary to begin with,
right?
Sorry, I misunderstood what you meant by a "48/52 hybrid configuration". I
thought you were referring to the configuration where userspace is 52-bit
and the kernel is 48-bit, which is something I think we can drop if we gain
support for 52-bit kernel.

Now that I understand what you mean, I think disabling KASAN would be fine
as long as it's a runtime thing and the kernel continues to work in every
other respect.
No, it would be a limitation of the 52-bit config which also supports
48-bit-VA-only-h/w that the address space is laid out in such a way
that there is simply no room for the KASAN shadow region, since it
would have to live in the 48-bit addressable area, but be big enough
to cover 52 bits of VA, which is impossible.

For the vmemmap space, we could live with sizing it statically to
cover a 52-bit VA linear region, but the KASAN shadow region is simply
too big.

So if KASAN support in that configuration is a requirement, then I
agree with Steve's approach, but it does imply that quite a number of
formerly compile-time constants now get turned into runtime variables.

Steve, do you have any idea what the impact of that is?
Hi Guys,

The KASAN region only really necessitates two things: 1) that we think
about the end address of the region (which is invariant) rather than the
start address; and that 2) we flip the kernel VA space. IIUC both these
changes have a neglible perf impact.

As for VA_BITS_ACTUAL, we need this in a few places: KVM mapping
support, and the big one phys_to/from_virt. For phys_to/from_virt the
logic is changed s.t. we use a variable lookup for translation but this
is folded into a new variable physvirt_offset (before the patch we used
a single variable read too).

Again IIUC there should be a minimal perf impact (unless one tries to do
cat /sys/kernel/debug/kernel_page_tables with KASAN enabled - but that
can be optimised later).

I didn't have the patience for ASCII art ;-), but I have a picture of
what I think it looks like here:
https://s3.amazonaws.com/connect.linaro.org/yvr18/presentations/yvr18-119.pdf
What I've tried to do is have most parts of the kernel VA space
invariant between 48/52 bits. If it's helpful I can type this up into a
document/commit log message?

For this series I have tried to introduce VA_BITS_MIN in its own patch
and also VA_BITS_ACTUAL into its own patch to make it easier to follow.
Hi Ard,

Apologies for my late reply, I had been staring at this for a while.
quoted
OK, perhaps I am just rephrasing what you essentially implemented
already, but let me try to explain a bit better what I mean:

- we flip the VA space in the way you suggest
- we limit the size of the top half of the address space to 47 bits
- KASAN region growns downwards from (~0) << 47
- we define PAGE_OFFSET as (~0) << 52, regardless of whether the h/w
supports LVA or not
- however, we tweak the phys/virt translation so that memory appears
in the 48-bit addressable part of the linear region on non-LVA
hardware

The latter basically means that the KASAN shadow region will intersect
the linear region, but whether we map memory or shadow pages there
depends on the h/w config at runtime.

The heart of the matter is probably the different placement of the
memory inside the linear region, depending on whether the h/w is LVA
capable or not, which is also reflected in your physvirt_offset. I am
just trying to figure out why we need VA_BITS_ACTUAL to be a runtime
variable.
Currently the direct linear map between configurations does not overlap,
we have:

FFF00000_00000000 - Direct linear map start (52-bit)
FFF80000_00000000 - Direct linear map end (52-bit)
FFFF0000_00000000 - Direct linear map start (48-bit)
FFFF8000_00000000 - Direct linear map end (48-bit)

We *can* make PAGE_OFFSET a constant for both 48/52 bit VA_BITS, if we
offset it. vmemmap can then be adjusted on early boot to ensure that
everything points to the right place. However we will get overlap for
52-bit configurations between KASAN and the direct linear map.

The question is: are we okay with quite a large overlap?

The KASAN region begins on 0xFFFDA000_00000000 for 52-bit. If we wish to
employ a "full" 47-bit direct linear map on 48-bit systems we need a
PAGE_OFFSET of 0xFFF78000_00000000 in order to make the direct linear
map end addresses "match up" between 48/52 bit configurations.

This doesn't leave us with a lot of room for 52-bit configurations
though, if KASAN is enabled.
OK, so with actual numbers, what I had in mind was


FFF00000_00000000  start of 52-bit addressable linear region | PAGE_OFFSET

FFFD8000_00000000  start of KASAN shadow region | KASAN_SHADOW_OFFSET

FFFF0000_00000000  start of 48-bit addressable linear region

FFFF6000_00000000  start of used KASAN shadow region (48-bit VA)
                   (KASAN_SHADOW_OFFSET + F0000_00000000 >> 3)

FFFF8000_00000000  start of vmemmap area - end of KASAN shadow region

FFFF8200_00000000  end of vmemmap area - start of bpf/module/etc area


The trick is that the full (52 - 3) bits KASAN shadow space overlaps
with the 48-bit linear region, but since you don't need KASAN shadow
pages for memory that does not exist, the region FFFF0000_00000000 -
FFFF6000_00000000 can be used for mapping the memory in case the h/w
is 48-bit only.

So in this case, PAGE_OFFSET and KASAN_SHADOW_OFFSET remain compile
time constants, and as long as we don't attempt to map anything
outside of the 48-bit addressable area on h/w that does not support
it, the fact that those quantities are outside the 48-bit range does
not really matter.
Thanks Ard,
I'll elaborate more on what I'm worrying about :-).

The 48/52 bit linear regions above do not overlap and this creates the
following issue.
OK, I see what you mean (I think). In my proposal, the linear regions
*do* overlap.

In my example, the vmemmap region is only sized to cover 51 bits of
linear region, but this is not sufficient, since the 52-bit linear
region is actually bigger than that.
Ahhhh, okay, nice (sorry I didn't parse your numbers correctly before).
So based on a linear region that goes from

FFF0_0000_0000_0000 ... FFFF_8000_0000_0000

we would end up with a vmemmap region

FFFF_8000_0000_0000 ... FFFF_83E0_0000_0000

covering the entire combined linear region.  This is a fair chunk of
the vmalloc space for 48-bit configuration, but I don't think that is
anything to worry about.
quoted
To go from a struct page * to a linear address we do the following:
lva = (page - VMEMMAP_START) * PAGE_SIZE / sizeof(struct page) + PAGE_OFFSET
OK, so given the above correction, we can take

VMEMMAP_START := FFFF_8000_0000_0000
PAGE_OFFSET := FFF0_0000_0000_0000

and everything still adds up afaict, and struct pages in the 48-bit VA
region are covered from FFFF_83C0_0000_0000 and up.
quoted
(Before my series) all the constants are fixed at compile time and thus
translation is very quick. My understanding is that you would like
PAGE_OFFSET to be constant to preserve the optimised nature of this
transform? (if not, please shout :-) )
Yes, the main idea is to have compile time constants for PAGE_OFFSET,
VA_BITS, etc
quoted
The problem is that a 52-bit PAGE_OFFSET = 0xFFF00000_00000000 will
never be able to give us an lva within a 48-bit addressable range. At
best we will get an lva of FFF80000_00000000.
You are assuming that we have to split the address space down the
middle, but I don't think that is necessary at all.
Agreed, some minor tweaks are needed to some helper functions to allow
for this.

Many thanks Ard, I'll give this a go.

Cheers,
-- 
Steve

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help