Thread (12 messages) 12 messages, 5 authors, 2025-06-06

Re: [PATCH] net: randomize layout of struct net_device

From: Eric Dumazet <edumazet@google.com>
Date: 2025-06-06 15:43:08
Also in: linux-kernel-mentees, lkml

On Fri, Jun 6, 2025 at 7:55 AM Pranav Tyagi [off-list ref] wrote:
On Tue, Jun 3, 2025 at 12:36 AM Andrew Lunn [off-list ref] wrote:
quoted
On Mon, Jun 02, 2025 at 11:03:18AM -0700, Kees Cook wrote:
quoted
On Mon, Jun 02, 2025 at 04:46:14PM +0200, Andrew Lunn wrote:
quoted
On Mon, Jun 02, 2025 at 07:29:32PM +0530, Pranav Tyagi wrote:
quoted
Add __randomize_layout to struct net_device to support structure layout
randomization if CONFIG_RANDSTRUCT is enabled else the macro expands to
do nothing. This enhances kernel protection by making it harder to
predict the memory layout of this structure.

Link: https://github.com/KSPP/linux/issues/188
I would note that the TODO item in this Issue is "evaluate struct
net_device".
quoted
A dumb question i hope.

As you can see from this comment, some time and effort has been put
into the order of members in this structure so that those which are
accessed on the TX fast path are in the same cache line, and those on
the RX fast path are in the same cache line, and RX and TX fast paths
are in different cache lines, etc.
This is pretty well exactly one of the right questions to ask, and
should be detailed in the commit message. Mainly: a) how do we know it
will not break anything? b) why is net_device a struct that is likely
to be targeted by an attacker?
For a), i doubt anything will break. The fact the structure has been
optimised for performance implies that members have been moved around,
and there are no comments in the structure saying don't move this,
otherwise bad things will happen.

There is a:

        u8                      priv[] ____cacheline_aligned
                                       __counted_by(priv_len);

at the end, but i assume RANDSTRUCT knows about them and won't move it.

As for b), i've no idea, not my area. There are a number of pointers
to structures contains ops. Maybe if you can take over those pointers,
point to something you can control, you can take control of the
Program Counter?
quoted
quoted
Does CONFIG_RANDSTRUCT understand this? It is safe to move members
around within a cache line. And it is safe to move whole cache lines
around. But it would be bad if the randomisation moved members between
cache lines, mixing up RX and TX fast path members, or spreading fast
path members over more cache lines, etc.
No, it'll move stuff all around. It's very much a security vs
performance trade-off, but the systems being built with it are happy to
take the hit.
It would be interesting to look back at the work optimising this
stricture to get a ball park figure how big a hit this is?

I also think some benchmark numbers would be interesting. I would
consider two different systems:

1) A small ARM/MIPS/RISC-V with 1G interfaces. The low amount of L1
cache on these systems mean that cache misses are important. So
spreading out the fast path members will be bad.

2) Desktop/Server class hardware, lots of cores, lots of cache, 10G,
40G or 100G interfaces. For these systems, i expect cache line
bouncing is more of an issue, so Rx and Tx fast path members want to
be kept in separate cache lines.
quoted
The basic details are in security/Kconfig.hardening in the "choice" following
the CC_HAS_RANDSTRUCT entry.
So i see two settings here. It looks like RANDSTRUCT_PERFORMANCE
should have minimal performance impact, so maybe this should be
mentioned in the commit message, and the benchmarks performed both on
full randomisation and with the performance setting.

I would also suggest a comment is added to the top of
Documentation/networking/net_cachelines/net_device.rst pointing out
this assumed RANDSTRUCT is disabled, and the existing comment in
struct net_device is also updated.

        Andrew
Resending to the list—my previous reply was accidentally sent off-list.

Apologies for the delayed response, and thank you
all for the detailed feedback.

Regarding the concern about breaking functionality,
I did compile and boot the kernel successfully with
this change, and everything appeared to work as
expected during basic testing. However, I agree
that this is not a substitute for thorough
benchmarking.

You're absolutely right that applying
__randomize_layout to net_device will shuffle
structure fields and likely incur a performance
penalty. As mentioned, this is a trade-off that
targets hardening over performance. It's worth
noting that CONFIG_RANDSTRUCT has two options:
RANDSTRUCT_FULL and RANDSTRUCT_PERFORMANCE, with
the latter aiming to minimize the impact by only
shuffling less performance-critical members.

I’d appreciate guidance on which specific
benchmarking tests would be most appropriate to
quantify the performance impact. Based on your
suggestions, I plan to run benchmarks on a small
SoC (ARM/MIPS/RISC-V) with 1G NICs. However, I
currently don’t have access to high-end server
hardware with 10G/40G+ NICs, so I’ll start with the
systems I have and clearly note the limitations in
the revised commit message.

I’ll also update the commit message to reflect the
security vs performance trade-offs, mention
RANDSTRUCT_PERFORMANCE, and add a reference to
net_cachelines/net_device.rst to document the
assumption of structure layout.

Thanks again for the thoughtful review—I’ll revise
the patch accordingly.
Do you have evidence of added security on this particular structure ?

What particular bug could have been avoided with __randomize_layout ?

Most distros use CONFIG_RANDSTRUCT_NONE=y, I do not think
__randomize_layout has a future.
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help