Thread (58 messages) 58 messages, 13 authors, 2023-07-04

Re: [PATCH v2 bpf-next 00/18] BPF token

From: Djalal Harouni <hidden>
Date: 2023-06-12 12:02:41
Also in: bpf

On Sat, Jun 10, 2023 at 12:57 AM Andrii Nakryiko
[off-list ref] wrote:
On Fri, Jun 9, 2023 at 3:30 PM Djalal Harouni [off-list ref] wrote:
quoted
Hi Andrii,

On Thu, Jun 8, 2023 at 1:54 AM Andrii Nakryiko [off-list ref] wrote:
quoted
...
creating new BPF objects like BPF programs, BPF maps, etc.
Is there a reason for coupling this only with the userns?
There is no coupling. Without userns it is at least possible to grant
CAP_BPF and other capabilities from init ns. With user namespace that
becomes impossible.
But these are not the same: delegate full cap vs delegate an fd mask?

One can argue unprivileged in init userns is the same privileged in
nested userns
Getting to delegate fd in init userns, then in nested ones seems logical...
quoted
The "trusted unprivileged" assumed by systemd can be in init userns?
It doesn't have to be systemd, but yes, BPF token can be created only
when you have CAP_SYS_ADMIN in init ns. It's in line with restrictions
on a bunch of other bpf() syscall commands (like GET_FD_BY_ID family
of commands).
I'm more into getting fd delegation work also in the first init userns...

I can't understand why it's not possible or doable?
quoted
quoted
Previous attempt at addressing this very same problem ([0]) attempted to
utilize authoritative LSM approach, but was conclusively rejected by upstream
LSM maintainers. BPF token concept is not changing anything about LSM
approach, but can be combined with LSM hooks for very fine-grained security
policy. Some ideas about making BPF token more convenient to use with LSM (in
particular custom BPF LSM programs) was briefly described in recent LSF/MM/BPF
2023 presentation ([1]). E.g., an ability to specify user-provided data
(context), which in combination with BPF LSM would allow implementing a very
dynamic and fine-granular custom security policies on top of BPF token. In the
interest of minimizing API surface area discussions this is going to be
added in follow up patches, as it's not essential to the fundamental concept
of delegatable BPF token.

It should be noted that BPF token is conceptually quite similar to the idea of
/dev/bpf device file, proposed by Song a while ago ([2]). The biggest
difference is the idea of using virtual anon_inode file to hold BPF token and
allowing multiple independent instances of them, each with its own set of
restrictions. BPF pinning solves the problem of exposing such BPF token
through file system (BPF FS, in this case) for cases where transferring FDs
over Unix domain sockets is not convenient. And also, crucially, BPF token
approach is not using any special stateful task-scoped flags. Instead, bpf()
What's the use case for transfering over unix domain sockets?
I'm not sure I understand the question. Unix domain socket
(specifically its SCM_RIGHTS ancillary message) allows to transfer
files between processes, which is one way to pass BPF object (like
prog/map/link, and now token). BPF FS is the other one. In practice
it's usually BPF FS, but there is no presumption about how file
reference is transferred.
Got it.

IIRC SCM_RIGHTS and SCM_CREDENTIALS are translated into the receiving
userns, no ?

I assume such which allows to set up things in a hierarchical way...

If I set up the environment to lock things down the line, I find it
strange if a received fd would allow me to do more things than what
was planned when I created the environment: namespaces, mounts, etc

I think you have to add the owning userns context to the fd or
"token", and on the receiving part if the current userns is the same
or a nested one of the current userns hierarchy then allow bpf
operation, otherwise fail with -EACCESS or something similar...

quoted
Will BPF token translation happen if you cross the different namespaces?
What does BPF token translation mean specifically? Currently it's a
very simple kernel object with refcnt and a few flags, so there is
nothing to translate?
Please see above comment about the owning userns context
quoted
If the token is pinned into different bpffs, will the token share the
same context?
So I was planning to allow a user process creating a BPF token to
specify custom user-provided data (context). This is not in this patch
set, but is it what you are asking about?
Exactly, define what you can access inside the container... this would
align with Andy's suggestion "making BPF behave sensibly in that
container seems like it should also be necessary." I do agree on this.

Again I think LSM and bpf+lsm should have the final word on this too...

Regardless, pinning BPF object in BPF FS is just basically bumping a
refcnt and exposes that object in a way that can be looked up through
file system path (using bpf() syscall's BPF_OBJ_GET command).
Underlying object isn't cloned or copied, it's exactly the same object
with the same shared internal state.
This is the part I also find strange, I can understand pinning a bpf
program, map, etc, but an fd that gives some access rights should be
part of the filesystem from the start, I don't get the extra pinning.
Also it seems bpffs is per superblock mount so why not allow
privileged to mount bpffs with the corresponding information, then
privileged can open the fd, set it up and pass it down the line when
executing the main program?  or even allow unprivileged to open it on
bpffs with some restrictive conditions?

Then it would be the business of the privileged to bind mount bpffs in
some other places, share it, etc

Having the fd or "token" that gives access rights pinned in two
separate bpffs mounts seems too much, it crosses namespaces (mount,
userns etc), environments setup by privileged...

I would just make it per bpffs mount and that's it, nothing more. If a
program wants to bind mount it somewhere else then it's not a bpf
problem.
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help