Thread (50 messages) 50 messages, 8 authors, 2012-02-21

Re: [PATCH v8 3/8] seccomp: add system call filtering using BPF

From: Will Drewry <wad@chromium.org>
Date: 2012-02-16 21:31:14
Also in: linux-arch, lkml

On Thu, Feb 16, 2012 at 3:17 PM, H. Peter Anvin [off-list ref] wrote:
On 02/16/2012 12:25 PM, Will Drewry wrote:
quoted

I agree :)  BPF being a 32-bit creature introduced some edge cases.  I
has started with a
    union { u32 args32[6]; u64 args64[6]; }

This was somewhat derailed by CONFIG_COMPAT behavior where
syscall_get_arguments always writes to argument of register width --
not bad, just irritating (since a copy isn't strictly necessary nor
actually done in the patch).  Also, Indan pointed out that while BPF
programs expect constants in the machine-local endian layout, any
consumers would need to change how they accessed the arguments across
big/little endian machines since a load of the low-order bits would
vary.

In a second pass, I attempted to resolve this like aio_abi.h:
   union {
     struct {
        u32 ENDIAN_SWAP(lo32, hi32);
      };
      u64 arg64;
    } args[6];
It wasn't clear that this actually made matters better (though it did
mean syscall_get_arguments() could write directly to arg64).  Usings

offsetof() in the user program would be fine, but any offsets set
another way would be invalid.  At that point, I moved to Indan's
proposal to stabilize low order and high order offsets -- what is in
the patch series.  Now a BPF program can reliably index into the low
bits of an argument and into the high bits without endianness changing
the filter program structure.

I don't feel strongly about any given data layout, and this one seems
to balance the 32-bit-ness of BPF and the impact that has on
endianness.  I'm happy to hear alternatives that might be more
aesthetically pleasing :)
I would have to say I think native endian is probably the sane thing still,
out of several bad alternatives.  Certainly splitting the high and low
halves of arguments is insane.
I'll push the bits around and see how well it plays out in sample/test
code.  Right now, the patch never even populates the data itself - it
just returns four bytes at the requested offset on-demand, so
kernel-side it's pretty simple to do it whatever way seems the least
hideous for the ABI.
The other thing that you really need in addition to system call number is
ABI identifier, since a syscall number may mean different things for
different entry points.  For example, on x86-64 system call number 4 is
write() if called via int $0x80 but stat() if called via syscall64. This is
a local property of the system call, not a global per process.
Looks like Markus just replied to this part.  I can certainly populate
a compat bit if the current approach is overconstrained, but I much
prefer to avoid making every user of seccomp need to know about the
subtleties of the calling conventions.

thanks!
will
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help