Re: Chromium sandbox on LoongArch and statx -- seccomp deep argument... | linux-api

Re: Chromium sandbox on LoongArch and statx -- seccomp deep argument inspection again?

From: Icenowy Zheng <hidden>
Date: 2024-02-25 06:51:46
Also in: linux-arch, lkml, loongarch

在 2024-02-24星期六的 19:51 +0800，Huacai Chen写道：

Hi, Xuerui,

On Wed, Feb 21, 2024 at 2:10 PM WANG Xuerui [off-list ref]
wrote:

quoted

Hi,

Recently, we -- community LoongArch porters -- have noticed a
problem
where the Chromium sandbox apparently wants to deny statx [^1] so
it
could properly inspect arguments after the sandboxed process later
falls
back to fstat. The reasoning behind the change was not clear in the
patch; but we found out it's basically because there's currently
not a
"fd-only" version of statx, so that the sandbox has no way to
ensure the
path argument is empty without being able to peek into the
sandboxed
process's memory. For architectures able to do newfstatat though,
the
glibc falls back to newfstatat after getting -ENOSYS for statx,
then the
respective SIGSYS handler [^2] takes care of inspecting the path
argument, transforming allowed newfstatat's into fstat instead
which is
allowed and has the same type of return value.

But, as loongarch is the first architecture to not have fstat nor
newfstatat, the LoongArch glibc does not attempt falling back at
all
when it gets -ENOSYS for statx -- and you see the problem there!

Actually, back when the loongarch port was under review, people
were
aware of the same problem with sandboxing clone3 [^3], so clone was
eventually kept. Unfortunately it seemed at that time no one had
noticed
statx, so besides restoring fstat/newfstatat to loongarch uapi (and
postponing the problem further), it seems inevitable that we would
need
to tackle seccomp deep argument inspection; this is obviously a
decision
that shouldn't be taken lightly, so I'm posting this to restart the
conversation to figure out a way forward together. We basically
could do
one of below:

- just restore fstat and be done with it;
- add a flag to statx so we can do the equivalent of just fstat(fd,
&out) with statx, and ensuring an error happens if path is not
empty in
that case;
- tackle the long-standing problem of seccomp deep argument
inspection (!).

From my point of view, I prefer to "restore fstat", because we need
to
use the Chrome sandbox everyday (even though it hasn't been upstream
by now). But I also hope "seccomp deep argument inspection" can be
solved in the future.

My idea is this problem needs syscalls to be designed with deep
argument inspection in mind; syscalls before this should be considered
as historical error and get fixed by resotring old syscalls.


Huacai

quoted

Obviously, the simplest solution would be to "just restore fstat".
But
again, in my opinion this is not quite a solution but a workaround
-- we
have good reasons to keep just statx (mainly because its feature
set is
a strict superset of those of fstat/newfstatat), and we're not
quite in
a hurry to resolve this. Because one of the prerequisites for a new
Chromium port is "inclusion in Debian stable" -- as the loong64
port
[^4] is progressing and the next Debian release would not happen
until
2025, we still have time for a more "elegant" solution.

Alternatively, we could also introduce a new flag for statx, maybe
named
AT_STATX_NO_PATH or something like that, so that statx would ignore
the
path altogether or error on non-empty paths if called with flags
containing AT_EMPTY_PATH | AT_STATX_NO_PATH. This way a seccomp
policy
could allow statx calls only with such flags (that are passed via
register and accessible) and maintain the same level of safety as
with
fstat. But there is also disadvantage to this approach: the flag
would
be useful only for sandboxes, because in other cases there's no
need to
avoid reading from &path. This is also more of a workaround to
avoid the
deep argument inspection problem.

Lastly, should we decide to go the hardest way, according to a
previous
mail [^5] (about clone3) and the LPC 2019 discussion [^6] [^7], we
probably would try the metadata-annotation-based and piece-by-piece
approach, as it's expected to provide the most benefit and involve
less
code changes. The implementation, as I surmise, will involve
modifying
the generic syscall entrypoint for early copying of user data, and
corresponding changes to seccomp plumbing so this information is
properly exposed. I don't have a roadmap for non-generic-entry
arches
right now, and I also haven't started designing the new seccomp ABI
for
that. As a matter of fact, members of the LoongArch community
(myself
included) are still fresh to this area of expertise, so a bit more
of
your feedback will be appreciated.

Thanks to Miao Wang from AOSC for doing much of the investigation.

[^1]:
https://chromium-review.googlesource.com/c/chromium/src/+/2823150
[^2]:
https://chromium.googlesource.com/chromium/src/sandbox/+/c085b51940bd/linux/seccomp-bpf-helpers/sigsys_handlers.cc#355
[^3]:
https://lore.kernel.org/linux-arch/20220511211231.GG7074@brightrain.aerifal.cx/ (local)
[^4]: https://wiki.debian.org/Ports/loong64
[^5]:
https://lwn.net/ml/linux-kernel/201905301122.88FD40B3@keescook/
[^6]: https://lwn.net/Articles/799557/
[^7]:
https://lpc.events/event/4/contributions/560/attachments/397/640/deep-arg-inspection.pdf

--
WANG "xen0n" Xuerui

Linux/LoongArch mailing list:https://lore.kernel.org/loongarch/

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help