Re: [PATCH bpf-next v4 3/8] bpf: lsm: provide attachment points for BPF LSM programs

From: Casey Schaufler <casey@schaufler-ca.com>
Date: 2020-02-24 22:29:36
Also in: bpf, lkml, netdev

On 2/24/2020 1:41 PM, Kees Cook wrote:

On Mon, Feb 24, 2020 at 10:45:27AM -0800, Casey Schaufler wrote:

quoted

On 2/24/2020 9:13 AM, KP Singh wrote:

quoted

On 24-Feb 08:32, Casey Schaufler wrote:

quoted

On 2/23/2020 2:08 PM, Alexei Starovoitov wrote:

quoted

On Fri, Feb 21, 2020 at 08:22:59PM -0800, Kees Cook wrote:

quoted

If I'm understanding this correctly, there are two issues:

1- BPF needs to be run last due to fexit trampolines (?)

no.
The placement of nop call can be anywhere.
BPF trampoline is automagically converting nop call into a sequence
of directly invoked BPF programs.
No link list traversals and no indirect calls in run-time.

Then why the insistence that it be last?

I think this came out of the discussion about not being able to
override the other LSMs and introduce a new attack vector with some
arguments discussed at:

  https://lore.kernel.org/bpf/20200109194302.GA85350@google.com/ (local)

Let's say we have SELinux + BPF runnng on the system. BPF should still
respect any decisions made by SELinux. This hasn't got anything to
do with the usage of fexit trampolines.

The discussion sited is more about GPL than anything else.

The LSM rule is that any security module must be able to
accept the decisions of others. SELinux has to accept decisions
made ahead of it. It always has, as LSM checks occur after
"traditional" checks, which may fail. The only reason that you
need to be last in this implementation appears to be that you
refuse to use the general mechanisms. You can't blame SELinux
for that.

Okay, this is why I wanted to try to state things plainly. The "in last
position" appears to be the result of a couple design choices:

-the idea of "not wanting to get in the way of other LSMs", while
 admirable, needs to actually be a non-goal to be "just" a stacked LSM
 (as you're saying here Casey). This position _was_ required for the
 non-privileged LSM case to avoid security implications, but that goal
 not longer exists here either.

Thanks.

-optimally using the zero-cost call-outs (static key + fexit trampolines)
 meant it didn't interact well with the existing stacking mechanism.

Exactly.

So, fine, these appear to be design choices, not *specifically*
requirements. Let's move on, I think there is more to unpack here...

Right.

quoted

2- BPF hooks don't know what may be attached at any given time, so
   ALL LSM hooks need to be universally hooked. THIS turns out to create
   a measurable performance problem in that the cost of the indirect call
   on the (mostly/usually) empty BPF policy is too high.

also no.

AIUI, there was some confusion on Alexei's reply here. I, perhaps,
was not as clear as I needed to be. I think the later discussion on
performance overheads gets more into the point, and gets us closer to
the objections Alexei had. More below...

Agreed.

quoted

  This approach still had the issues of an indirect call and an
  extra check when not used. So this was not truly zero overhead even
  after special casing BPF.

The LSM mechanism is not zero overhead. It never has been. That's why
you can compile it out. You get added value at a price. You get the
ability to use SELinux and KRSI together at a price. If that's unacceptable
you can go the route of seccomp, which doesn't use LSM for many of the
same reasons you're on about.
[...]

quoted

So, trying to avoid the indirect calls is, as you say, an optimization,
but it might be a needed one due to the other limitations.

I'm convinced that avoiding the cost of retpoline in critical path is a
requirement for any new infrastructure in the kernel.

Sorry, I haven't gotten that memo.

I agree with Casey here -- it's a nice goal, but those cost evaluations have
not yet(?[1]) hit the LSM world. I think it's a desirable goal, to be
sure, but this does appear to be an early optimization.

Thanks for helping clarify that.

quoted

[...]
It can do that wholly within KRSI hooks. You don't need to
put KRSI specific code into security.c.

This observation is where I keep coming back to.

Yes, the resulting code is not as fast as it could be. The fact that BPF
triggers the worst-case performance of LSM hooking is the "new" part
here, from what I can see.

I haven't put this oar in the water before, but mightn't it be
possible to configure which LSM hooks can have BPF programs installed
at compile time, and thus address this issue for the "production" case?
I fully expect that such a configuration option would be hideously ugly
both to implement and use. If the impact of unused BPF hooks is so
great a concern, perhaps it would be worthwhile.

I suspect the problem is that folks in the BPF subsystem don't want to
be seen as slowing anything down, even other subsystems, so they don't
want to see this done in the traditional LSM hooking way (which contains
indirect calls).

I get that, and it is a laudable goal, but ...

But the LSM subsystem doesn't want special cases (Casey has worked very
hard to generalize everything there for stacking). It is really hard to
accept adding a new special case when there are still special cases yet
to be worked out even in the LSM code itself[2].

... like Kees says, this isn't the only use case we have to deal with.

quoted

Networking stack converted all such places to conditional calls.
In BPF land we converted indirect calls to direct jumps and direct calls.
It took two years to do so. Adding new indirect calls is not an option.
I'm eagerly waiting for Peter's static_call patches to land to convert
a lot more indirect calls. May be existing LSMs will take advantage
of static_call patches too, but static_call is not an option for BPF.
That's why we introduced BPF trampoline in the last kernel release.

Sorry, but I don't see how BPF is so overwhelmingly special.

My analogy here is that if every tracepoint in the kernel were of the
type:

if (trace_foo_enabled) // <-- Overhead here, solved with static key
   trace_foo(a);  // <-- retpoline overhead, solved with fexit trampolines

This is a helpful distillation; thanks.

static keys (perhaps better described as static branches) make sense to
me; I'm familiar with them being used all over the place[3]. The resulting
"zero performance" branch mechanism is extremely handy.

I had been thinking about the fexit stuff only as a way for BPF to call
into kernel functions directly, and I missed the place where this got
used for calling from the kernel into BPF directly. KP walked me through
the fexit stuff off list. I missed where there NOP stub ("noinline int
bpf_lsm_##NAME(__VA_ARGS__) { return 0; }") was being patched by BPF in
https://lore.kernel.org/lkml/20200220175250.10795-6-kpsingh@chromium.org/ (local)
The key bit being "bpf_trampoline_update(prog)"

quoted

It would be very hard to justify enabling them on a production system,
and the same can be said for BPF and KRSI.

The same can be and has been said about the LSM infrastructure.
If BPF and KRSI are that performance critical you shouldn't be
tying them to LSM, which is known to have overhead. If you can't
accept the LSM overhead, get out of the LSM. Or, help us fix the
LSM infrastructure to make its overhead closer to zero. Whether
you believe it or not, a lot of work has gone into keeping the LSM
overhead as small as possible while remaining sufficiently general
to perform its function.

No. If you're too special to play by LSM rules then you're special
enough to get into the kernel using more direct means.

So, I see the primary conflict here being about the performance
optimizations. AIUI:

- BPF subsystem maintainers do not want any new slowdown associated
  with BPF

Right.

- LSM subsystem maintainers do not want any new special cases in
  LSM stacking

Right.

So, unless James is going to take this over Casey's objections, the path
forward I see here is:

- land a "slow" KRSI (i.e. one that hooks every hook with a stub).
- optimize calling for all LSMs

Does this seem right to everyone?

This would be my first choice. The existing list-of-hooks mechanism is
an "obvious" implementation. I have been thinking about possible ways to
make it better, but as y'all may have guessed, I haven't seen all the cool
new optimization techniques.

-Kees


[1] There is a "known cost to LSM", but as Casey mentions, it's been
generally deemed "acceptable". There have been some recent attempts to
quantify it, but it's not been very clear:
https://lore.kernel.org/linux-security-module/c98000ea-df0e-1ab7-a0e2-b47d913f50c8@tycho.nsa.gov/ (local) (lore is missing half this conversation for some reason)

[2] Casey's work to generalize the LSM interfaces continues and it quite
complex:
https://lore.kernel.org/linux-security-module/20200214234203.7086-1-casey@schaufler-ca.com/ (local)

[3] E.g. HARDENED_USERCOPY uses it:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/mm/usercopy.c?h=v5.5#n258
and so does the heap memory auto-initialization:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/mm/slab.h?h=v5.5#n676

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help