Re: SGX vs LSM (Re: [PATCH v20 00/28] Intel SGX1 support)
From: Sean Christopherson <hidden>
Date: 2019-05-23 23:40:49
Also in:
lkml, selinux
On Thu, May 23, 2019 at 08:38:17AM -0700, Andy Lutomirski wrote:
On Thu, May 23, 2019 at 7:17 AM Sean Christopherson [off-list ref] wrote:quoted
On Thu, May 23, 2019 at 01:26:28PM +0300, Jarkko Sakkinen wrote:quoted
On Wed, May 22, 2019 at 07:35:17PM -0700, Sean Christopherson wrote:quoted
But actually, there's no need to disallow mmap() after ECREATE since the LSM checks also apply to mmap(), e.g. FILE__EXECUTE would be needed to mmap() any enclave pages PROT_EXEC. I guess my past self thought mmap() bypassed LSM checks? The real problem is that mmap()'ng an existing enclave would require FILE__WRITE and FILE__EXECUTE, which puts us back at square one.I'm lost with the constraints we want to set.As is today, SELinux policies would require enclave loaders to have FILE__WRITE and FILE__EXECUTE permissions on /dev/sgx/enclave. Presumably other LSMs have similar requirements. Requiring all processes to have FILE__{WRITE,EXECUTE} permissions means the permissions don't add much value, e.g. they can't be used to distinguish between an enclave that is being loaded from an unmodified file and an enclave that is being generated on the fly, e.g. Graphene. Looking back at Andy's mail, he was talking about requiring FILE__EXECUTE to run an enclave, so perhaps it's only FILE__WRITE that we're trying to special case.I thought about this some more, and I have a new proposal that helps address the ELRANGE alignment issue and the permission issue at the cost of some extra verbosity. Maybe you all can poke holes in it :) The basic idea is to make everything more explicit from a user's perspective. Here's how it works: Opening /dev/sgx/enclave gives an enclave_fd that, by design, doesn't give EXECUTE or WRITE. mmap() on the enclave_fd only works if you pass PROT_NONE and gives the correct alignment. The resulting VMA cannot be mprotected or mremapped. It can't be mmapped at all until
I assume you're thinking of clearing all VM_MAY* flags in sgx_mmap()?
after ECREATE because the alignment isn't known before that.
I don't follow. The alignment is known because userspace knows the size of its enclave. The initial unknown is the address, but that becomes known once the initial mmap() completes.
Associated with the enclave are a bunch (up to 7) "enclave segment
I assume 7 = R, W, X, RW, RX, WX and RWX?
inodes". These are anon_inodes that are created automagically. An enclave segment is a group of pages, not necessary contiguous, with an upper bound on the memory permissions. Each enclave page belongs to a segment. When you do EADD, you tell the driver what segment you're adding to. [0] This means that EADD gets an extra argument that is a permission mask for the page -- in addition to the initial SECINFO, you also pass to EADD something to the effect of "I promise never to map this with permissions greater than RX". Then we just need some way to mmap a region from an enclave segment. This could be done by having a way to get an fd for an enclave segment or it could be done by having a new ioctl SGX_IOC_MAP_SEGMENT. User code would use this operation to replace, MAP_FIXED-style, ranges from the big PROT_NONE mapping with the relevant pages from the enclave segment. The resulting vma would only have VM_MAYWRITE if the segment is W, only have VM_MAYEXEC if the segment is X, and only have VM_MAYREAD if the segment is R. Depending on implementation details, the VMAs might need to restrict mremap() to avoid mapping pages that aren't part of the segment in question.
If my above assumptions regarding VM_MAY* and the "7 segments" are correct, IIUC you're proposing that an LSM could have policies for each of the anon inodes, e.g. grant/deny RWX vs. RW vs RX. Am I in the ballpark?
It's plausible that this whole thing works without the magic segment inodes under the hood, but figuring that out would need a careful look at how all the core mm bits and LSM bits work together. To get all the LSM stuff to work, SELinux will need some way to automatically assign an appropriate label to the segment inodes. I assume that such a mechanism already exists and gets used for things like sockets, but I haven't actually confirmed this.
I (obviously) don't fully understand your proposal, but I don't think we
want to hook inodes, e.g. AppArmor doesn't implement inode_permission()
but does implement file_mprotect() and mmap_file(), which feel like the
natural hooks for this sort of thing. I also think it's overkill, e.g.
AppArmor doesn't have a concept of EXECMOD, EXECMEM, EXECHEAP, etc.., so
I don't think we need to go beyond detecting W+X scenarios.
Starting with your original idea of tracking "safe to execute" and
Cedric's of propagating the permissions from the source VMA, but tweaked
with your new idea of clearing VM_MAY* and a custom MAP_FIXED/mprotect().
Add SGX_IOC_MPROTECT (or SGX_IOC_MAP_REGION?) that works as follows:
1. Track VM_MAY{READ,WRITE,EXEC} flags for each enclave page.
2. SGX_IOC_ADD_REGION, i.e. EADD, initializes the VM_MAY* flags for each
enclave page based on the source VMA.
3. sgx_mmap() only works with PROT_NONE, skips alignment stuff if
MAP_FIXED, and clears VM_MAY{READ,WRITE,EXEC}.
4. mprotect() on /dev/sgx/enclave doesn't work because the VMA doesn't
have any VM_MAY{READ,WRITE,EXEC} capabilities.
5. Deny mremap() post-ECREATE as the address and size of the enclave
are fixed at ECREATE (in hardware).
6. SGX_IOC_MPROTECT works like normal mprotect(), except the VM_MAY*
flags are pulled from the enclave pages, and its call to
security_file_mprotect() is VM_READ|VM_EXEC by default. The LSM call
sets VM_WRITE iff the enclave page has both VM_MAYWRITE and
VM_MAYEXEC. The idea here is to require READ and EXECUTE to run an
enclave, and only require WRITE on /dev/sgx/enclave when the enclave
can execute modified memory.
To support SGX2 down the road, which will want to convert a page to
executable on the fly, we could add:
7. SGX_IOC_EXTEND_PERMISSIONS enables userspace to extend the VM_MAY*
flags for an enclave page, e.g. to make a page executable.
SGX_IOC_MPROTECT is still required to actually map the page.
Notably, adding a RW page to the enclave, e.g. to grow its heap,
doesn't require WRITE, whereas adding a RWX page, e.g. for dynamic
loading, would require WRITE. This can only extend! E.g. userspace
can't circumvent the WRITE requirement by clearing VM_MAYWRITE.
Note, FILE__WRITE on /dev/sgx/enclave is essentially equivalent to
FILE__EXECMOD. Using FILE__WRITE in this way means there are no changes
to SELinux (triggering FILE__EXECMOD would be awkward), and AppArmor also
picks up extra protections for enclaves.
[0] There needs to be some vaguely intelligent semantics if you EADD the *same* address more than once. A simple solution would be to disallow it if the segments don't match.
I don't see any reason to allow duplicate EADD as it serves no purpose, e.g. doing so changes the enclave's measurement and that's it.