On 16 January 2018 at 18:38, Serge E. Hallyn [off-list ref] wrote:
Quoting Jann Horn (jannh-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org):
quoted
On Tue, Jan 9, 2018 at 7:52 PM, Serge E. Hallyn [off-list ref] wrote:
quoted
Update the capabilities(7) manpage with a description of the
new-ish namespaced file capability support.
A note on userspace tools: since the kernel will automatically
convert between v2 and v3 xattrs, and translate nsroot between
v3 xattrs, we can make do with the current getcap(8) and setcap(8)
tools. I.e. a user on the host can create a transient user namespace
with the appropriate mappings and run setcap(8) there. The kernel
will automatically write a v3 xattr with the transient namespace's
root user as nsroot.
Signed-off-by: Serge Hallyn <redacted>
---
man7/capabilities.7 | 44 ++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 44 insertions(+)
diff --git a/man7/capabilities.7 b/man7/capabilities.7
index 166eaaf..76e7e02 100644
--- a/man7/capabilities.7
+++ b/man7/capabilities.7
@@ -936,6 +936,50 @@ if we specify the effective flag as being enabled for any capability,
then the effective flag must also be specified as enabled
for all other capabilities for which the corresponding permitted or
inheritable flags is enabled.
+.PP
+Until 4.13, only VFS_CAP_REVISION_2 xattrs were supported. These store only
+the capabilities to be applied to the file, with no record of the writer's
+credentials. Therefore only privileged users can be trusted to write them, and
+.BR CAP_SETFCAP
+over the user namespace which mounted the filesystem (usually the initial user
+namespace) is required. This makes it impossible to write file capabilities
+from a user namespaced container, which causes some package updates to fail.
+.PP
+In order to support setting file capabilities in containers, the
+kernel must be able to identify whether the task executing the
+file will be constrained to a subset of the resources over which
+the writer of the file capabilities has privilege. To this end,
+since 4.13, VFS_CAP_REVISION_3 capabilities store the user ID
+of the root user in the writer's namespace ("nsroot"). Hence the writer only
+requires
+.IP 1.
+.BR CAP_SETFCAP
+over the file inode, meaning the writing task must have
+.BR CAP_SETFCAP
+over a user namespace into which the inode's owning user ID is mapped.
+.PP
+and
+.IP 2.
+.BR CAP_SETFCAP
+over the writer's own user namespace.
I think that the following would be clearer (but technically
equivalent): "Hence the writer only requires CAP_SETFCAP over the file
inode, meaning that the writing task must have CAP_SETFCAP in its own
user namespace and the UID and GID of the file inode must be mapped in
the writing task's user namespace.".
Looks good to me.
quoted
quoted
+A VFS_CAP_REVISION_3 file capability will take effect only when run in a user namespace
+whose UID 0 maps to the saved "nsroot", or a descendant of such a namespace.
+.PP
+Users with the required privilege may use
+.BR setxattr(2)
+to request either a VFS_CAP_REVISION_2 or VFS_CAP_REVISION_3 write.
+The kernel will automatically convert a VFS_CAP_REVISION_2 to a
+VFS_CAP_REVISION_3 extended attribute with the "nsroot"
+set to the root user in the writer's user namespace, or, if a VFS_CAP_REVISION_3
+extended attribute is specified, then the kernel will map the
+specified root user ID (which must be a valid user ID mapped in the caller's
+user namespace) into the initial user namespace.
Really, "into the initial user namespace"? That may be true for the
kernel-internal representation, but the on-disk representation is the
mapping into the user namespace that contains the mount namespace into
which the file system was mounted, right?
Ah, yes, it is.
quoted
This would become observable
when a file system is mounted in a different namespace than before, or
when working with FUSE in a namespace.
Yes it would.
Michael, you said you were reworking it, do you mind working this into
it as well?
Yes, I'll do that. It may be a couple of weeks before I get some more
cycles for this, however.
Thanks,
Michael
--
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/
--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html