Thread (127 messages) 127 messages, 11 authors, 2019-06-04

RE: SGX vs LSM (Re: [PATCH v20 00/28] Intel SGX1 support)

From: Xing, Cedric <hidden>
Date: 2019-05-30 05:38:11
Also in: lkml, selinux

From: Christopherson, Sean J
Sent: Tuesday, May 28, 2019 2:41 PM

On Tue, May 28, 2019 at 01:48:02PM -0700, Andy Lutomirski wrote:
quoted
On Tue, May 28, 2019 at 1:24 PM Sean Christopherson
[off-list ref] wrote:
quoted
Actually, I think we do have everything we need from an LSM perspective.
LSMs just need to understand that sgx_enclave_load() with a NULL vma
implies a transition from RW.  For example, SELinux would interpret
sgx_enclave_load(NULL, RX) as requiring FILE__EXECMOD.
You lost me here.  What operation triggers this callback?  And
wouldn't sgx_enclave_load(NULL, RX) sometimes be a transition from RO
or just some fresh executable zero bytes?
An explicit ioctl() after EACCEPTCOPY to update the allowed permissions.
For all intents and purposes, the EAUG'd page must start RW.  Maybe a better way to phrase
it is that at some point the page must be writable to have any value whatsover.
EACCEPTCOPY explicitly requires the page to be at least RW.  EACCEPT technically doesn't
require RW, but a RO or RX zero page is useless.  Userspace could still EACCEPT with RO or
RX, but SGX would assume a minimum of RW for the purposes of the LSM check.
Why is an explicit ioctl() necessary after EACCEPTCOPY? Or why is mprotect() not sufficient?

I tend to agree on Andy's MAXPERM model where MAXPERM never changes once established.
quoted
quoted
As Cedric mentioned earlier, the host process doesn't necessarily
know which pages will end up RW vs RX, i.e. sgx_enclave_load(NULL,
RX) already has to be invoked at runtime, and when that happens, the
kernel can take the opportunity to change the VMAs from MAY_RW to MAY_RX.

For simplicity in the kernel and clarity in userspace, it makes
sense to require an explicit ioctl() to add the to-be-EAUG'd range.
That just leaves us wanting an ioctl() to set the post-EACCEPT{COPY} permissions.

E.g.:

    ioctl(<prefix>_ADD_REGION, { NULL }) /* NULL == EAUG, MAY_RW */

    mprotect(addr, size, RW);
    ...

    EACCEPTCOPY -> EAUG /* page fault handler */

    ioctl(<prefix>_ACTIVATE_REGION, { addr, size, RX}) /* MAY_RX */

    mprotect(addr, size, RX);
In the maxperm model, this mprotect() will fail unless MAXPERM
contains RX, which could only happen if MAXPERM=RWX.  So, regardless
of how it's actually mapped to SELinux policy, MAXPERM=RWX is
functionally like EXECMOD and actual RWX PTEs are functionally like
EXECMEM.
Yep, same idea, except in the proposed flow ACTIVATE_REGION.
quoted
quoted
    ...

And making ACTIVATE_REGION a single-shot per page eliminates the
need for the MAXPERMS concept (see below).
quoted
If we keep only one MAXPERM, wouldn't this be the current behavior
of mmap()/mprotect()?

To be a bit more clear, system admin sets MAXPERM upper bound in
the form of FILE__{READ|WRITE|EXECUTE|EXECMOD} of
/dev/sgx/enclave. Then for a process/enclave, if what it requires
falls below what's allowed on /dev/sgx/enclave, then everything
will just work. Otherwise, it fails in the form of -EPERM returned
from mmap()/mprotect(). Please note that MAXPERM here applies to
"runtime" permissions, while "initial" permissions are taken care
of by security_enclave_{load|init}. "initial" permissions could be
more permissive than "runtime" permissions, e.g., RX is still
required for initial code pages even though system admins could disable dynamically
loaded code pages by *not* giving FILE__{EXECUTE|EXECMOD}. Therefore, the "initial"
quoted
quoted
quoted
mapping would still have to be done by the driver (to bypass LSM),
either via a new ioctl or as part of IOC_EINIT.
Aha!

Starting with Cedric's assertion that initial permissions can be
taken directly from SECINFO:

  - Initial permissions for *EADD* pages are explicitly handled via
    sgx_enclave_load() with the exact SECINFO permissions.

  - Initial permissions for *EAUG* are unconditionally RW.  EACCEPTCOPY
    requires the target EPC page to be RW, and EACCEPT with RO is useless.

  - Runtime permissions break down as follows:
      R   - N/A, subset of RW (EAUG)
      W   - N/A, subset of RW (EAUG) and x86 paging can't do W
      X   - N/A, subset of RX (x86 paging can't do XO)
Sure it can!  You just have a hypervisor that maps a PA bit to EPT
no-read.  Then you can use that PA bit to suppress read.  Also, Linux
already abuses PKRU to simulate XO, although that won't work for
enclaves.
Heh, I intentionally said "x86 paging" to rule out EPT :-)  I'm pretty sure it's a moot
point though, I have a hard time believing an LSM will allow RW->X and not RW->RX.
quoted
quoted
      RW  - Handled by EAUG LSM hook (uses RW unconditionally)
      WX  - N/A, subset of RWX (x86 paging can't do WX)
      RX  - Handled by ACTIVATE_REGION
      RWX - Handled by ACTIVATE_REGION

In other words, if we define the SGX -> LSM calls as follows (minus
the file pointer and other params for brevity):

  - <prefix>_ACTIVATE_REGION(vma, perms) -> sgx_enclave_load(NULL,
perms)
I'm not sure on what security_enclave_load()'s decision would be based.
quoted
quoted
  - <prefix>_ADD_REGION(vma) -> sgx_enclave_load(vma, SECINFO.perms)

  - <prefix>_ADD_REGION(NULL) -> sgx_enclave_load(NULL, RW)

then SGX and LSMs have all the information and hooks needed.  The
catch is that the LSM semantics of sgx_enclave_load(..., RW) would
need to be different than normal shared memory, e.g. FILE__WRITE
should *not* be required, but that's ok since it's an SGX specific
hook.  And if for some reason an LSM wanted to gate access to EAUG
*without* FILE__EXECMOD, it'd have the necessary information to do so.

The userspace changes are fairly minimal:

  - For SGX1, use PROT_NONE for the initial mmap() and refactor ADD_PAGE
    to ADD_REGION.

  - For SGX2, do an explicit ADD_REGION on the ranges to be EAUG'd, and an
    ACTIVATE_REGION to make a region RX or R (no extra ioctl() required to
    keep RW permissions).

Because ACTIVATE_REGION can only be done once per page, to do
*abitrary*
mprotect() transitions, userspace would need to set the
added/activated permissions to be a superset of the transitions,
e.g. RW -> RX would require RWX, but that's a non-issue.
I may be misunderstanding or just be biased to my own proposal, but
this seems potentially more complicated and less flexible than the
MAXPERM model.  One of the main things that made me come up with
MAXPERM is that I wanted to avoid any complicated PTE/VMA modification
or runtime changes.  So, with MAXPERM, we still need to track the
MAXPERM bits per page, but we don't ever need to *change* them or to
worry about what is or is not mapped anywhere at any given time.  With
ACTIVATE_REGION, don't we need to make sure that we don't have a
second VMA pointing at the same pages?  Or am I just confused?
In theory, it's still your MAXPERM model, but with the unnecessary states removed and the
others enforced/handled by the natural SGX transitions instead of explictly in ioctls.
Underneath the hood the SGX driver would still need to track the MAXPERM.
What are the "unnecessary states" removed? 

I'm not sure understand the proposal fully. The whole thing looks to me like the driver is undertaking things that should/would otherwise be done by mmap()/mprotect() syscalls. It also imposes unnecessary restrictions on user mode code, such as mmap(PROT_NONE), ACTIVATE_REGION can be called only once, etc. What'd happen if ACTIVATE_REGION is called with a range spanning multiple/partial VMAs? What'd happen if an enclave was unmapped than mapped again? I'd say the proposal is unintuitive at least.

In theory, if the driver can keep track of MAXPERM for all pages within an enclave, then it could fail mmap() if the requested prot conflicts with any page's MAXPERM within that range. Otherwise, MAXPERM could be copied into VM_MAY* flags then mprotect() will just follow through. Wouldn't that be a much simpler and more intuitive approach?
With SGX1, SECINFO == MAXPERM.  With SGX2, ACTIVATE_REGION == MAXPERM, with the
implication that the previous state is always RW.
quoted
quoted
  - For SGX1 it's a nop since it's impossible to change the EPCM
    permissions, i.e. the page would need to be RWX regardless.
I may still be missing something, but, for SGX1, it's possible at
least in principle for the enclave to request, via ocall or similar,
that the untrusted runtime do mprotect().  It's not even such a bad
idea.  Honestly, enclaves *shouldn't* have anything actually writable
and executable at once because the enclaves don't want to be easily
exploited.
Yes, but the *EPCM* permissions are immutable.  So if an enclave wants to do RW->RX it has
to intialize its pages to RWX.  And because the untrusted runtime is, ahem, untrusted, the
enclave cannot rely on userspace to never map its pages RWX.  In other words, from a
enclave security perspective, an SGX1 enclave+runtime that uses RW->RX is no different
than an enclave that uses RWX.  Using your earlier terminology, an SGX1 enclave *should*
get a dirty looks if maps a page RWX in the EPCM, even if it only intends RW->RX behavior.
quoted
quoted
  - For SGX2, userspace can suck it up and request RWX to do completely
    arbitrary transitions (working as intended), or the kernel can support
    trimming (removing) pages from an enclave, which would allow userspace
    to do "arbitrary" transitions by first removing the page.
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help