Re: [PATCH v7 5/5] namei: resolveat(2) syscall
From: Aleksa Sarai <hidden>
Date: 2019-05-26 05:47:18
Also in:
linux-arch, linux-fsdevel, lkml
On 2019-05-25, Linus Torvalds [off-list ref] wrote:
In fact, I think resolveat() as a model is fundamentally wrong for yet another reason: O_CREAT. If you want to _create_ a new file, and you want to still have the path resolution modifiers in place, the resolveat() model is broken, because it only gives you path resolution for the lookup, and then when you do openat(O_CREAT) for the final component, you now don't have any way to limit that last component. Sure, you can probably effectively hack around it with resolveat() on everything but the last component, and then openat(O_CREAT|O_EXCL|O_NOFOLLOW) on the last component, because the O_EXCL|O_NOFOLLOW should mean that it won't do any of the unsafe things. And then (if you didn't actually want the O_EXCL), you handle the race between "somebody else got there first" by re-trying etc. So I suspect the O_CREAT thing could be worked around with extra complexity, but it's an example of how the O_PATH model really screws you over. End result: I really think resolveat() is broken. It absolutely *needs* to be a full-fledged "openat()" system call, just with added path resolution flags.
That's a very good point. I was starting to work on O_CREAT via resolveat(2) and it definitely was much harder than most people would be happy to deal with -- I'm not even sure that I handled all the cases. I'll go for an openat2(2) then. Thinking about it some more -- since it's a new syscall, I could actually implement the O_PATH link-mode as being just the regular mode argument (since openat(2) ignores the mode for non-O_CREAT anyway). The openat(2) wrapper might be more than one-line as a result but it should avoid polluting the resolution flags with mode flags (since openat(O_PATH) needs to have a g+rwx mode for backwards-compatibility).
quoted
openat2(rootfd, "path/to/file", O_PATH, RESOLVE_IN_ROOT, &statbuf);Note that for O_CREAT, it either needs the 'mode' parameter too, or the statbuf would need to be an in-out thing. I think the in-out model might be nice (the varargs model with a conditional 'mode' parameter is horrid, I think), but at some point it's just bike-shedding. Also, I'm not absolutely married to the statbuf, but I do think it might be a useful extension. A *lot* of users need the size of the file for subsequent mmap() calls (or for buffer management for read/write interface) or for things like just headers (ie "Content-length:" in html etc). I'm not sure people actually want a full 'struct stat', but people historically also do st_ino/st_dev for indexing into existing user-space caches (or to check permissions like that it's on the right filesystem, but the resolve flags kind of make that less of an issue). And st_mode to verify that it's a proper regular file etc etc. You can always just leave it as NULL if you don't care, there's almost no downside to adding it, and I do think that people who want a "walk pathname carefully" model almost always would want to also check the end result anyway.
Yeah, I agree -- most folks would want to double-check what they've opened or O_PATH'd. Though I'm still not clear what is the best way of doing the "stat" argument is -- especially given how much fun architecture-specific shenanigans are in fs/stat.c (should I only use cp_new_stat64 or have a separate 64-bit syscall).
quoted
Is there a large amount of overhead or downside to the current open->fstat way of doing things, or is this more of a "if we're going to add more ways of opening we might as well add more useful things"?Right now, system calls are sadly very expensive on a lot of hardware. We used to be very proud of the fact that Linux system calls were fast, but with meltdown and retpoline etc, we're back to "system calls can be several thousand cycles each, just in overhead, on commonly available hardware". Is "several thousand cycles" fatal? Not necessarily. But I think that if we do add a new system call, particularly a fancy one for special security-conscious models, we should look at what people need and use, and want. And performance should always be a concern. I realize that people who come at this from primarily just a security issue background may think that security is the primary goal. But no. Security always needs to realize that the _primary_ goal is to have people _use_ it. Without users, security is entirely pointless. And the users part is partly performance, but mostly "it's convenient".
Yup, I agree. I was hoping to shunt most of the convenience to userspace to avoid ruffling feathers in VFS-land, but I'm more than happy to make the kernel ABI more convenient.
The whole "this is Linux-specific" is a big inconvenience point
I hope that other OSes will take our lead and have a similar interface so that the particular inconvenience can go away eventually (this was one of the arguments for resolveat(2) -- it's a clear "this is a new idea" interface rather than mixing it with other O_* flags).
Talking about securely opening things - another flag that we may want to avoid issues is a "don't open device nodes" flag. Sure, O_NONBLOCK plus checking the st_mode of the result is usually sufficient, but it's actually fairly easy to get that wrong. Things like /dev/tty and /dev/zero often need to be available for various reasons, and have been used to screw careless "open and read" users up that missed a check.
Sure, this could be added -- though I'm sure folks would have disagreements over whether it should be a resolution flag on an open flag.
I also do wonder that if the only actual user-facing interface for the resolution flags is a new system call, should we not make the *default* value be "don't open anything odd at all". So instead of saying RESOLVE_XDEV for "don't cross mount points", maybe the flags should be the other way around, and say "yes, allow mount point crossings", and "yes, explicitly allow device node opening", and "yes, allow DOTDOT" etc.
This seems like a reasonable default, except for the cases of RESOLVE_BENEATH and RESOLVE_IN_ROOT (that would make using AT_FDCWD more cumbersome than necessary). But other than that, I'd be happy to invert all the other flags. -- Aleksa Sarai Senior Software Engineer (Containers) SUSE Linux GmbH <https://www.cyphar.com/>
Attachments
- signature.asc [application/pgp-signature] 833 bytes