Re: [PATCH 0/4] mm: permit guard regions for file-backed/shmem mappings

From: Lorenzo Stoakes <hidden>
Date: 2025-02-19 19:21:14
Also in: linux-kselftest, linux-mm, lkml

On Wed, Feb 19, 2025 at 10:52:04AM -0800, Kalesh Singh wrote:

On Wed, Feb 19, 2025 at 1:17 AM Lorenzo Stoakes
[off-list ref] wrote:

quoted

On Wed, Feb 19, 2025 at 10:15:47AM +0100, David Hildenbrand wrote:

quoted

On 19.02.25 10:03, Lorenzo Stoakes wrote:

quoted

On Wed, Feb 19, 2025 at 12:25:51AM -0800, Kalesh Singh wrote:

quoted

On Thu, Feb 13, 2025 at 10:18 AM Lorenzo Stoakes
[off-list ref] wrote:

quoted

The guard regions feature was initially implemented to support anonymous
mappings only, excluding shmem.

This was done such as to introduce the feature carefully and incrementally
and to be conservative when considering the various caveats and corner
cases that are applicable to file-backed mappings but not to anonymous
ones.

Now this feature has landed in 6.13, it is time to revisit this and to
extend this functionality to file-backed and shmem mappings.

In order to make this maximally useful, and since one may map file-backed
mappings read-only (for instance ELF images), we also remove the
restriction on read-only mappings and permit the establishment of guard
regions in any non-hugetlb, non-mlock()'d mapping.

Hi Lorenzo,

Thank you for your work on this.

You're welcome.

quoted

Have we thought about how guard regions are represented in /proc/*/[s]maps?

This is off-topic here but... Yes, extensively. No they do not appear
there.

I thought you had attended LPC and my talk where I mentioned this
purposefully as a drawback?

I went out of my way to advertise this limitation at the LPC talk, in the
original series, etc. so it's a little disappointing that this is being
brought up so late, but nobody else has raised objections to this issue so
I think in general it's not a limitation that matters in practice.

Sorry for raising this now, yes at the time I believe we discussed
reducing the vma slab memory usage for the PROT_NONE mappings. I
didn't imagine that apps could have dependencies on the mapped ELF
ranges in /proc/self/[s]maps until recent breakages from a similar
feature. Android itself doesn't depend on this but what I've seen is
banking apps and apps that have obfuscation to prevent reverse
engineering (the particulars of such obfuscation are a black box).

Ack ok fair enough, sorry, but obviously you can understand it's
frustrating when I went to great lengths to advertise this not only at the
talk but in the original series.

Really important to have these discussions early. Not that really we can do
much about this, as inherently this feature cannot give you what you need.

Is it _only_ banking apps that do this? And do they exclusively read
/proc/$pid/maps? I mean there's nothing we can do about that, sorry. If
that's immutable, then unless you do your own very, very, very slow custom
android maps implementation (that will absolutely break the /proc/$pid/maps
scalability efforts atm) this is just a no-go.

quoted

In the field, I've found that many applications read the ranges from
/proc/self/[s]maps to determine what they can access (usually related
to obfuscation techniques). If they don't know of the guard regions it
would cause them to crash; I think that we'll need similar entries to
PROT_NONE (---p) for these, and generally to maintain consistency
between the behavior and what is being said from /proc/*/[s]maps.

No, we cannot have these, sorry.

Firstly /proc/$pid/[s]maps describes VMAs. The entire purpose of this
feature is to avoid having to accumulate VMAs for regions which are not
intended to be accessible.

Secondly, there is no practical means for this to be accomplished in
/proc/$pid/maps in _any_ way - as no metadata relating to a VMA indicates
they have guard regions.

This is intentional, because setting such metadata is simply not practical
- why? Because when you try to split the VMA, how do you know which bit
gets the metadata and which doesn't? You can't without _reading page
tables_.

Yeah the splitting becomes complicated with any vm flags for this...
meaning any attempt to expose this in /proc/*/maps have to
unconditionally walk the page tables :(

It's not really complicated, it's _impossible_ unless you made literally
all VMA code walk page tables for every single operation. Which we are
emphatically not going to do :)

And no, /proc/$pid/maps is _never_ going to walk page tables. For obvious
performance reasons.

quoted

/proc/$pid/smaps _does_ read page tables, but we can't start pretending
VMAs exist when they don't, this would be completely inaccurate, would
break assumptions for things like mremap (which require a single VMA) and
would be unworkable.

The best that _could_ be achieved is to have a marker in /proc/$pid/smaps
saying 'hey this region has guard regions somewhere'.

And then simply expose it in /proc/$pid/pagemap, which is a better interface
for this pte-level information inside of VMAs. We should still have a spare
bit for that purpose in the pagemap entries.

Ah yeah thanks David forgot about that!

This is also a possibility if that'd solve your problems Kalesh?

I'm not sure what is the correct interface to advertise these. Maybe
smaps as you suggested since we already walk the page tables there?
and pagemap bit for the exact pages as well? It won't solve this
particular issue, as 1000s of in field apps do look at this through
/proc/*/maps. But maybe we have to live with that...

I mean why are we even considering this if you can't change this anywhere?
Confused by that.

I'm afraid upstream can't radically change interfaces to suit this
scenario.

We also can't change smaps in the way you want, it _has_ to still give
output per VMA information.

The proposed change that would be there would be a flag or something
indicating that the VMA has guard regions _SOMEWHERE_ in it.

Since this doesn't solve your problem, adds complexity, and nobody else
seems to need it, I would suggest this is not worthwhile and I'd rather not
do this.

Therefore for your needs there are literally only two choices here:

1. Add a bit to /proc/$pid/pagemap OR
2. a new interface.

I am not in favour of a new interface here, if we can just extend pagemap.

What you'd have to do is:

1. Find virtual ranges via /proc/$pid/maps
2. iterate through /proc/$pid/pagemaps to retrieve state for all ranges.

Since anything that would retrieve guard region state would need to walk
page tables, any approach would be slow and I don't think this would be any
less slow than any other interface.

This way you'd be able to find all guard regions all the time.

This is just the trade-off for this feature unfortunately - its whole
design ethos is to allow modification of -faulting- behaviour without
having to modify -VMA- behaviour.

But if it's banking apps whose code you can't control (surprised you don't
lock down these interfaces), I mean is this even useful to you?

If your requirement is 'you have to change /proc/$pid/maps to show guard
regions' I mean the answer is that we can't.

We can argue that such apps are broken since they may trip on the
SIGBUS off the end of the file -- usually this isn't the case for the
ELF segment mappings.

Or tearing of the maps interface, or things getting unmapped or or
or... It's really not a sane thing to do.

This is still useful for other cases, I just wanted to get some ideas
if this can be extended to further use cases.

Well I'm glad that you guys find it useful for _something_ ;)

Again this wasn't written only for you (it is broadly a good feature for
upstream), but I did have your use case in mind, so I'm a little
disappointed that it doesn't help, as I like to solve problems.

But I'm glad it solves at least some for you...

Thanks,
Kalesh

quoted

This bit will be fought over haha

quoted

--
Cheers,

David / dhildenb

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help