Re: [PATCH 0/5 RFC] Add an interface to discover relationships between namespaces
From: Michael Kerrisk (man-pages) <hidden>
Date: 2016-07-25 14:46:29
Also in:
linux-fsdevel, lkml
Hi Eric, On 07/25/2016 03:18 PM, Eric W. Biederman wrote:
"Michael Kerrisk (man-pages)" [off-list ref] writes:quoted
Hi Andrey, On 07/22/2016 08:25 PM, Andrey Vagin wrote:quoted
On Thu, Jul 21, 2016 at 11:48 PM, Michael Kerrisk (man-pages) [off-list ref] wrote:quoted
Hi Andrey, On 07/21/2016 11:06 PM, Andrew Vagin wrote:quoted
On Thu, Jul 21, 2016 at 04:41:12PM +0200, Michael Kerrisk (man-pages) wrote:quoted
Hi Andrey, On 07/14/2016 08:20 PM, Andrey Vagin wrote:<snip>quoted
Could you add here an of the API in detail: what do these FDs refer to, and how do you use them to solve the use case? And could you you add that info to the commit messages please.Hi Michael, A patch for man-pages is attached. It adds the following text to namespaces(7). Since Linux 4.X, the following ioctl(2) calls are supported for names‐ pace file descriptors. The correct syntax is: fd = ioctl(ns_fd, ioctl_type); where ioctl_type is one of the following: NS_GET_USERNS Returns a file descriptor that refers to an owning user names‐ pace. NS_GET_PARENT Returns a file descriptor that refers to a parent namespace. This ioctl(2) can be used for pid and user namespaces. For user namespaces, NS_GET_PARENT and NS_GET_USERNS have the same mean‐ ing.For each of the above, I think it is worth mentioning that the close-on-exec flag is set for the returned file descriptor.Hmm. That is an odd default.
Why do you say that? It's pretty common as the default for various APIs that create new FDs these days. (There's of course a strong argument that the original UNIX default was a design blunder...)
quoted
quoted
quoted
quoted
In addition to generic ioctl(2) errors, the following specific ones can occur: EINVAL NS_GET_PARENT was called for a nonhierarchical namespace. EPERM The requested namespace is outside of the current namespace scope.Perhaps add "and the caller does not have CAP_SYS_ADMIN" in the initial user namespace"?Having looked at that bit of code I don't think capabilities really have a role to play.
Yes, I caught up with that now. I await to see how this plays out in the next patch version.
quoted
quoted
quoted
quoted
ENOENT ns_fd refers to the init namespace.Thanks for this. But still part of the question remains unanswered. How do we (in user-space) use the file descriptors to answer any of the questions that this patch series was designed to solve? (This info should be in the commit message and the man-pages patch.)I'm sorry, but I am not sure that I understand what you ask. Here are the origin questions: Someone else then asked me a question that led me to wonder about generally introspecting on the parental relationships between user namespaces and the association of other namespaces types with user namespaces. One use would be visualization, in order to understand the running system. Another would be to answer the question I already mentioned: what capability does process X have to perform operations on a resource governed by namespace Y? Here is an example which shows how we can get the owning namespace inode number by using these ioctl-s. $ ls -l /proc/13929/ns/pid lrwxrwxrwx 1 root root 0 Jul 22 21:03 /proc/13929/ns/pid -> 'pid:[4026532228]' $ ./nsowner /proc/13929/ns/pid user:[4026532227] The owning user namespace for pid:[4026532228] is user:[4026532227]. The nsowner tool is cimpiled from this code: int main(int argc, char *argv[]) { char buf[128], path[] = "/proc/self/fd/0123456789"; int ns, uns, ret; ns = open(argv[1], O_RDONLY); if (ns < 0) return 1; uns = ioctl(ns, NS_GET_USERNS); if (uns < 0) return 1; snprintf(path, sizeof(path), "/proc/self/fd/%d", uns); ret = readlink(path, buf, sizeof(buf) - 1); if (ret < 0) return 1; buf[ret] = 0; printf("%s\n", buf); return 0; }So, from my point of view, the important piece that was missing from your commit message was the note to use readlink("/proc/self/fd/%d") on the returned FDs. I think that detail needs to be part of the commit message (and also the man page text). I think it even be helpful to include the above program as part of the commit message: it helps people more quickly grasp the API.Please, please make the standard way to compare these things fstat. That is much less magic than a symlink, and a little more future proof. Possibly even kcmp.
As in fstat() to get the st_ino field, right? Cheers, Michael
At some point we will care about migrating a migrating sub-container and we may have to have some minor changes. Eric
-- Michael Kerrisk Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/ Linux/UNIX System Programming Training: http://man7.org/training/ _______________________________________________ Containers mailing list Containers@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/containers