Re: [PATCH v3 3/3] randomize_kstack: Unify random source across arches

[PATCH v3 0/3] Fix bugs and performance of kstack offset randomisation · Ryan Roberts <ryan.roberts@arm.com> · 2026-01-02
[PATCH v3 1/3] randomize_kstack: Maintain kstack_offset per task · Ryan Roberts <ryan.roberts@arm.com> · 2026-01-02
Re: [PATCH v3 1/3] randomize_kstack: Maintain kstack_offset per task · David Laight <hidden> · 2026-01-03
Re: [PATCH v3 1/3] randomize_kstack: Maintain kstack_offset per task · Ryan Roberts <ryan.roberts@arm.com> · 2026-01-05
Re: [PATCH v3 1/3] randomize_kstack: Maintain kstack_offset per task · Mark Rutland <mark.rutland@arm.com> · 2026-01-19
[PATCH v3 2/3] prandom: Convert prandom_u32_state() to __always_inline · Ryan Roberts <ryan.roberts@arm.com> · 2026-01-02
Re: [PATCH v3 2/3] prandom: Convert prandom_u32_state() to __always_inline · "Jason A. Donenfeld" <Jason@zx2c4.com> · 2026-01-02
Re: [PATCH v3 2/3] prandom: Convert prandom_u32_state() to __always_inline · Ryan Roberts <ryan.roberts@arm.com> · 2026-01-02
Re: [PATCH v3 2/3] prandom: Convert prandom_u32_state() to __always_inline · "Christophe Leroy (CS GROUP)" <chleroy@kernel.org> · 2026-01-03
Re: [PATCH v3 2/3] prandom: Convert prandom_u32_state() to __always_inline · Ryan Roberts <ryan.roberts@arm.com> · 2026-01-05
Re: [PATCH v3 2/3] prandom: Convert prandom_u32_state() to __always_inline · David Laight <hidden> · 2026-01-03
Re: [PATCH v3 2/3] prandom: Convert prandom_u32_state() to __always_inline · Ryan Roberts <ryan.roberts@arm.com> · 2026-01-05
Re: [PATCH v3 2/3] prandom: Convert prandom_u32_state() to __always_inline · David Laight <hidden> · 2026-01-02
Re: [PATCH v3 2/3] prandom: Convert prandom_u32_state() to __always_inline · Mark Rutland <mark.rutland@arm.com> · 2026-01-19
[PATCH v3 3/3] randomize_kstack: Unify random source across arches · Ryan Roberts <ryan.roberts@arm.com> · 2026-01-02
Re: [PATCH v3 3/3] randomize_kstack: Unify random source across arches · David Laight <hidden> · 2026-01-04
Re: [PATCH v3 3/3] randomize_kstack: Unify random source across arches · Ryan Roberts <ryan.roberts@arm.com> · 2026-01-05
Re: [PATCH v3 3/3] randomize_kstack: Unify random source across arches · David Laight <hidden> · 2026-01-05
Re: [PATCH v3 3/3] randomize_kstack: Unify random source across arches · David Laight <hidden> · 2026-01-07
Re: [PATCH v3 3/3] randomize_kstack: Unify random source across arches · Ryan Roberts <ryan.roberts@arm.com> · 2026-01-12
Re: [PATCH v3 3/3] randomize_kstack: Unify random source across arches · David Laight <hidden> · 2026-01-12
Re: [PATCH v3 3/3] randomize_kstack: Unify random source across arches · Mark Rutland <mark.rutland@arm.com> · 2026-01-19
Re: [PATCH v3 0/3] Fix bugs and performance of kstack offset randomisation · Mark Rutland <mark.rutland@arm.com> · 2026-01-19
Re: [PATCH v3 0/3] Fix bugs and performance of kstack offset randomisation · David Laight <hidden> · 2026-01-19
Re: [PATCH v3 0/3] Fix bugs and performance of kstack offset randomisation · Ryan Roberts <ryan.roberts@arm.com> · 2026-01-19
Re: [PATCH v3 0/3] Fix bugs and performance of kstack offset randomisation · Ryan Roberts <ryan.roberts@arm.com> · 2026-01-19

From: David Laight <hidden>
Date: 2026-01-04 23:01:41
Also in: linux-arm-kernel, linux-hardening, linux-riscv, linux-s390, lkml, loongarch

On Fri,  2 Jan 2026 13:11:54 +0000
Ryan Roberts [off-list ref] wrote:

Previously different architectures were using random sources of
differing strength and cost to decide the random kstack offset. A number
of architectures (loongarch, powerpc, s390, x86) were using their
timestamp counter, at whatever the frequency happened to be. Other
arches (arm64, riscv) were using entropy from the crng via
get_random_u16().

There have been concerns that in some cases the timestamp counters may
be too weak, because they can be easily guessed or influenced by user
space. And get_random_u16() has been shown to be too costly for the
level of protection kstack offset randomization provides.

So let's use a common, architecture-agnostic source of entropy; a
per-cpu prng, seeded at boot-time from the crng. This has a few
benefits:

  - We can remove choose_random_kstack_offset(); That was only there to
    try to make the timestamp counter value a bit harder to influence
    from user space.

  - The architecture code is simplified. All it has to do now is call
    add_random_kstack_offset() in the syscall path.

  - The strength of the randomness can be reasoned about independently
    of the architecture.

  - Arches previously using get_random_u16() now have much faster
    syscall paths, see below results.

There have been some claims that a prng may be less strong than the
timestamp counter if not regularly reseeded. But the prng has a period
of about 2^113. So as long as the prng state remains secret, it should
not be possible to guess. If the prng state can be accessed, we have
bigger problems.

If you have 128 bits of output from consecutive outputs I think you
can trivially determine the full state using (almost) 'school boy' maths
that could be done on pencil and paper.
(Most of the work only has to be done once.)

The underlying problem is that the TAUSWORTHE() transformation is 'linear'
So that TAUSWORTHE(x ^ y) == TAUSWORTHE(x) ^ TAUSWORTHE(y).
(This is true of a LFSR/CRC and TOUSWORTH() is doing some subset of CRCs.)
This means that each output bit is the 'xor' of some of the input bits.
The four new 'state' values are just xor of the the bits of the old ones.
The final xor of the four states gives a 32bit value with each bit just
an xor of some of the 128 state bits.
Get four consecutive 32 bit values and you can solve the 128 simultaneous
equations (by trivial substitution) and get the initial state.
The solution gives you the 128 128bit constants for:
	u128 state = 0;
	u128 val = 'value returned from 4 calls';
	for (int i = 0; i < 128; i++)
		state |= parity(const128[i] ^ val) << i;
You done need all 32bits, just accumulate 128 bits.  
So if you can get the 5bit stack offset from 26 system calls you know the
value that will be used for all the subsequent calls.

Simply changing the final line to use + not ^ makes the output non-linear
and solving the equations a lot harder.

I might sit down tomorrow and see if I can actually code it...

	David

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help