Re: [PATCH v7 5/5] namei: resolveat(2) syscall

From: Aleksa Sarai <hidden>
Date: 2019-05-26 05:47:18
Also in: linux-arch, linux-fsdevel, lkml

On 2019-05-25, Linus Torvalds [off-list ref] wrote:

In fact, I think resolveat() as a model is fundamentally wrong for yet
another reason: O_CREAT. If you want to _create_ a new file, and you
want to still have the path resolution modifiers in place, the
resolveat() model is broken, because it only gives you path resolution
for the lookup, and then when you do openat(O_CREAT) for the final
component, you now don't have any way to limit that last component.

Sure,  you can probably effectively hack around it with resolveat() on
everything but the last component, and then
openat(O_CREAT|O_EXCL|O_NOFOLLOW) on the last component, because the
O_EXCL|O_NOFOLLOW should mean that it won't do any of the unsafe
things. And then (if you didn't actually want the O_EXCL), you handle
the race between "somebody else got there first" by re-trying etc. So
I suspect the O_CREAT thing could be worked around with extra
complexity, but it's an example of how the O_PATH model really screws
you over.

End result: I really think resolveat() is broken. It absolutely
*needs* to be a full-fledged "openat()" system call, just with added
path resolution flags.

That's a very good point. I was starting to work on O_CREAT via
resolveat(2) and it definitely was much harder than most people would be
happy to deal with -- I'm not even sure that I handled all the cases.

I'll go for an openat2(2) then. Thinking about it some more -- since
it's a new syscall, I could actually implement the O_PATH link-mode as
being just the regular mode argument (since openat(2) ignores the mode
for non-O_CREAT anyway). The openat(2) wrapper might be more than
one-line as a result but it should avoid polluting the resolution flags
with mode flags (since openat(O_PATH) needs to have a g+rwx mode for
backwards-compatibility).

quoted

  openat2(rootfd, "path/to/file", O_PATH, RESOLVE_IN_ROOT, &statbuf);

Note that for O_CREAT, it either needs the 'mode' parameter too, or
the statbuf would need to be an in-out thing. I think the in-out model
might be nice (the varargs model with a conditional 'mode' parameter
is horrid, I think), but at some point it's just bike-shedding.

Also, I'm not absolutely married to the statbuf, but I do think it
might be a useful extension. A *lot* of users need the size of the
file for subsequent mmap() calls (or for buffer management for
read/write interface) or for things like just headers (ie
"Content-length:" in html etc).

I'm not sure people actually want a full 'struct stat', but people
historically also do st_ino/st_dev for indexing into existing
user-space caches (or to check permissions like that it's on the right
filesystem, but the resolve flags kind of make that less of an issue).
And st_mode to verify that it's a proper regular file etc etc.

You can always just leave it as NULL if you don't care, there's almost
no downside to adding it, and I do think that people who want a "walk
pathname carefully" model almost always would want to also check the
end result anyway.

Yeah, I agree -- most folks would want to double-check what they've
opened or O_PATH'd. Though I'm still not clear what is the best way of
doing the "stat" argument is -- especially given how much fun
architecture-specific shenanigans are in fs/stat.c (should I only use
cp_new_stat64 or have a separate 64-bit syscall).

quoted

Is there a large amount of overhead or downside to the current
open->fstat way of doing things, or is this more of a "if we're going to
add more ways of opening we might as well add more useful things"?

Right now, system calls are sadly very expensive on a lot of hardware.
We used to be very proud of the fact that Linux system calls were
fast, but with meltdown and retpoline etc, we're back to "system calls
can be several thousand cycles each, just in overhead, on commonly
available hardware".

Is "several thousand cycles" fatal? Not necessarily. But I think that
if we do add a new system call, particularly a fancy one for special
security-conscious models, we should look at what people need and use,
and want. And performance should always be a concern.

I realize that people who come at this from primarily just a security
issue background may think that security is the primary goal. But no.
Security always needs to realize that the _primary_ goal is to have
people _use_ it. Without users, security is entirely pointless. And
the users part is partly performance, but mostly "it's convenient".

Yup, I agree. I was hoping to shunt most of the convenience to userspace
to avoid ruffling feathers in VFS-land, but I'm more than happy to make
the kernel ABI more convenient.

The whole "this is Linux-specific" is a big inconvenience point

I hope that other OSes will take our lead and have a similar interface
so that the particular inconvenience can go away eventually (this was
one of the arguments for resolveat(2) -- it's a clear "this is a new
idea" interface rather than mixing it with other O_* flags).

Talking about securely opening things - another flag that we may want
to avoid issues is a "don't open device nodes" flag. Sure, O_NONBLOCK
plus checking the st_mode of the result is usually sufficient, but
it's actually fairly easy to get that wrong. Things like /dev/tty and
/dev/zero often need to be available for various reasons, and have
been used to screw careless "open and read" users up that missed a
check.

Sure, this could be added -- though I'm sure folks would have
disagreements over whether it should be a resolution flag on an open
flag.

I also do wonder that if the only actual user-facing interface for the
resolution flags is a new system call, should we not make the
*default* value be "don't open anything odd at all".

So instead of saying RESOLVE_XDEV for "don't cross mount points",
maybe the flags should be the other way around, and say "yes, allow
mount point crossings", and "yes, explicitly allow device node
opening", and "yes, allow DOTDOT" etc.

This seems like a reasonable default, except for the cases of
RESOLVE_BENEATH and RESOLVE_IN_ROOT (that would make using AT_FDCWD more
cumbersome than necessary). But other than that, I'd be happy to invert
all the other flags.

-- 
Aleksa Sarai
Senior Software Engineer (Containers)
SUSE Linux GmbH
<https://www.cyphar.com/>

Attachments

signature.asc [application/pgp-signature] 833 bytes

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help