Thread (15 messages) 15 messages, 2 authors, 2021-08-13

Re: Questions re the new mount_setattr(2) manual page

From: Michael Kerrisk (man-pages) <hidden>
Date: 2021-08-12 05:36:34
Also in: linux-fsdevel, lkml

Hello Christian,

Thanks for the answers.

A couple of small queries still below.

On 8/11/21 12:07 PM, Christian Brauner wrote:
On Tue, Aug 10, 2021 at 11:06:52PM +0200, Michael Kerrisk (man-pages) wrote:
[...]
quoted
quoted
quoted
quoted
      EINVAL The mount that is to be ID mapped is not a
             detached/anonymous mount; that is, the mount is
???
What is a the distinction between "detached" and "anonymous"?
Or do you mean them to be synonymous? If so, then let's use
just one term, and I think "detached" is preferable.
Yes, they are synonymous here. I list both because detached can
potentially be confusing. A detached mount is a mount that has not been
visible in the filesystem. But if you attached it an then unmount it
right after and keep the fd for the mountpoint open it's a detached
mount purely on a natural language level, I'd argue. But it's not a
detached mount from the kernel's view anymore because it has been
exposed in the filesystem and is thus not detached anymore.
But I do prefer "detached" to "anonymous" and that confusion is very
unlikely to occur.
Thanks. I made it "detached". Elsewhere, the page already explains
that a detached mount is one that:

          must have been created by calling open_tree(2) with the
          OPEN_TREE_CLONE flag and it must not already have been
          visible in the filesystem.

Which seems a fine explanation. 

????
But, just a thought... "visible in the filesystem" seems not quite accurate. 
What you really mean I guess is that it must not already have been
/visible in the filesystem hierarchy/previously mounted/something else/,
right?
I suppose that I should have clarified that my main problem was
that you were using the word "filesystem" in a way that I find
unconventional/ambiguous. I mean, I normally take the term
"filesystem" to be "a storage system for folding files".
Here, you are using "filesystem" to mean something else, what 
I might call like "the single directory hierarchy" or "the
filesystem hierarchy" or "the list of mount points".
A detached mount is created via the OPEN_TREE_CLONE flag. It is a
separate new mount so "previously mounted" is not applicable.
A detached mount is _related_ to what the MS_BIND flag gives you with
mount(2). However, they differ conceptually and technically. A MS_BIND
mount(2) is always visible in the fileystem when mount(2) returns, i.e.
it is discoverable by regular path-lookup starting within the
filesystem.

However, a detached mount can be seen as a split of MS_BIND into two
distinct steps:
1. fd_tree = open_tree(OPEN_TREE_CLONE): create a new mount
2. move_mount(fd_tree, <somewhere>):     attach the mount to the filesystem

1. and 2. together give you the equivalent of MS_BIND.
In between 1. and 2. however the mount is detached. For the kernel
"detached" means that an anonymous mount namespace is attached to it
which doen't appear in proc and has a 0 sequence number (Technically,
there's a bit of semantical argument to be made that "attached" and
"detached" are ambiguous as they could also be taken to mean "does or
does not have a parent mount". This ambiguity e.g. appears in
do_move_mount(). That's why the kernel itself calls it an "anonymous
mount". However, an OPEN_TREE_CLONE-detached mount of course doesn't
have a parent mount so it works.).

For userspace it's better to think of detached and attached in terms of
visibility in the filesystem or in a mount namespace. That's more
straightfoward, more relevant, and hits the target in 90% of the cases.

However, the better and clearer picture is to say that a
OPEN_TREE_CLONE-detached mount is a mount that has never been
move_mount()ed. Which in turn can be defined as the detached mount has
never been made visible in a mount namespace. Once that has happened the
mount is irreversibly an attached mount.

I keep thinking that maybe we should just say "anonymous mount"
everywhere. So changing the wording to:
I'm not against the word "detached". To user space, I think it is a
little more meaningful than "anonymous". For the moment, I'll stay with
"detached", but if you insist on "anonymous", I'll probably change it.
[...]
EINVAL The mount that is to be ID mapped is not an anonymous mount;
that is, the mount has already been visible in a mount namespace.
I like that text *a lot* better! Thanks very much for suggesting
wordings. It makes my life much easier. 

I've made the text:

       EINVAL The mount that is to be ID mapped is not a detached
              mount; that is, the mount has not previously been
              visible in a mount namespace.
[...]
The mount must be an anonymous mount; that is, it must have been
created by calling open_tree(2) with the OPEN_TREE_CLONE flag and it
must not already have been visible in a mount namespace, i.e. it must
not have been attached to the filesystem hierarchy with syscalls such
as move_mount() syscall.
And that too! I've made the text:

       •  The mount must be a detached mount; that is, it must have
          been created by calling open_tree(2) with the
          OPEN_TREE_CLONE flag and it must not already have been
          visible in a mount namespace.  (To put things another way:
          the mount must not have been attached to the filesystem
          hierarchy with a system call such as move_mount(2).)
[...]

(I'm using the formulation "with syscalls such as move_mount()" to
future proof this. :)).
Fair enough.
quoted
quoted
quoted
quoted
  EXAMPLES
???
Do you have a (preferably simple) example piece of code
somewhere for setting up an ID mapped mount?
????
I guess the best example is this:
https://github.com/brauner/mount-idmapped/
right?
Ah yes, sorry. I forgot to answer that yesterday. I sent you links via
another medium but I repeat it here.
There are two places. The link you have here is a private repo. But I've
also merged a program alongside the fstests testsuite I merged:
https://git.kernel.org/pub/scm/fs/xfs/xfstests-dev.git/tree/src/idmapped-mounts/mount-idmapped.c
which should be nicer and has seen reviews by Amir and Christoph.
Thanks.

[...]
quoted
quoted
quoted
quoted
          int fd_tree = open_tree(-EBADF, source,
                       OPEN_TREE_CLONE | OPEN_TREE_CLOEXEC |
                       AT_EMPTY_PATH | (recursive ? AT_RECURSIVE : 0));
???
What is the significance of -EBADF here? As far as I can tell, it
is not meaningful to open_tree()?
I always pass -EBADF for similar reasons to [2]. Feel free to just use -1.
????
But here, both -EBADF and -1 seem to be wrong. This argument 
is a dirfd, and so should either be a file descriptor or the
value AT_FDCWD, right?
[1]: In this code "source" is expected to be absolute. If it's not
     absolute we should fail. This can be achieved by passing -1/-EBADF,
     afaict.
D'oh! Okay. I hadn't considered that use case for an invalid dirfd.
(And now I've done some adjustments to openat(2),which contains a
rationale for the *at() functions.)

So, now I understand your purpose, but still the code is obscure,
since

* You use a magic value (-EBADF) rather than (say) -1.
* There's no explanation (comment about) of the fact that you want
  to prevent relative pathnames.

So, I've changed the code to use -1, not -EBADF, and I've added some
comments to explain that the intent is to prevent relative pathnames.
Okay?

But, there is still the meta question: what's the problem with using
a relative pathname?

[...]
quoted
quoted
quoted
quoted
          ret = move_mount(fd_tree, "", -EBADF, target,
                           MOVE_MOUNT_F_EMPTY_PATH);
???
What is the significance of -EBADF here? As far as I can tell, it
is not meaningful to move_mount()?
See [2].
????
As above, both -EBADF and -1 seem to be wrong. This argument 
is a dirfd, and so should either be a file descriptor or the
value AT_FDCWD, right?
See [1].
I made the same change as above.

Thanks,

Michael

-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help