Thread (35 messages) 35 messages, 8 authors, 2017-02-07

Re: [PATCH net] bpf: expose netns inode to bpf programs

From: Alexei Starovoitov <hidden>
Date: 2017-02-03 06:06:01

On Fri, Feb 03, 2017 at 05:33:45PM +1300, Eric W. Biederman wrote:
The point is that we can make the inode number stable across migration
and the user space API for namespaces has been designed with that
possibility in mind.

What you have proposed is the equivalent of reporting a file name, and
instead of reporting /dir1/file1 /dir2/file1 just reporting file1 for
both cases.

That is problematic.

It doesn't matter that eBPF and CRIU do not mix.  When we implement
migration of the namespace file descriptors and can move them from
one system to another preserving the device number and inode number
so that criu of other parts of userspace can function better there will
be a problem.  There is not one unique inode number per namespace and
the proposed interface in your eBPF programs is broken.

I don't know when inode numbers are going to be the bottleneck we decide
to make migratable to make CRIU work better but things have been
designed and maintained very carefully so that we can do that.

Inode numbers are in the namespace of the filesystem they reside in.
I saw that iproute2 is doing:
  if ((st.st_dev == netst.st_dev) &&
      (st.st_ino == netst.st_ino)) {
but proc_alloc_inum() is using global ida,
so I figured that iproute2 extra st_dev check must have been obsolete.
So the long term plan is to make /proc to be namespace-aware?
That's fair. In such case exposing inode only will
lead to wrong assumptions.
quoted
quoted
But you told Eric that his nack doesn't matter, and maybe it would be
nice to ask him to clarify instead.
Fair enough. Eric, thoughts?
In very short terms exporting just the inode number would require
implementing a namespace of namespaces, and that is NOT happening.
We are not going to design our kernel interfaces so badly that we need
to do that.

At a bare minimum you need to export the device number of the filesystem
as well as the inode number.
Agree. Will do.
My expectation would be that now you are starting to look at concepts
that are namespaced the way you would proceed would be to associate a
full set of namespaces with your ebpf program.  Those namespaces would
come from the submitter of your ebpf program.  Namespaced values
would be in the terms of your associated namespaces.

That keeps things working the way userspace would expect.

The easy way to build such an association is to not allow your
contextless ebpf programs from being submitted to kernel in anything
other than the initial set of namespaces.

But please assume all global identifiers are namespaced.  If they aren't
that needs to be fixed because not having them namespaced will break
process migration at some point.

In short the fix here is to export both the inode number the device
number.  That is what it takes to uniquely identify a file.  It would be
Agree. Will respin.
good if you went farther and limited your contextless ebpf programs to
only being installed by programs in the initial set of namespaces.
you mean to limit to init_net only? This might break existing users.
Does that make things clearer?
yep. thanks for the feedback.
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help