Re: BUG: 4.14.11 unable to handle kernel NULL pointer dereference in xfrm_lookup
From: Steffen Klassert <steffen.klassert@secunet.com>
Date: 2018-01-24 09:59:23
On Fri, Jan 19, 2018 at 03:45:46PM +0100, Tobias Hommel wrote:
I tried to strip down the system configuration and was able to reproduce the problem with a minimal configuration: * ipsets are not used anymore * no firewall markings are used any longer * iptables are "completely empty", i.e. all policies set to ACCEPT and there is no rule in any table * no additional routing policies (ip rule) except the default ones * only main routing table is used * using a "minimal" kernel config: * run `make defconfig` * add basic things (ESP, IGB driver, some crypto algorithms) * add options required to boot up the system (TPM crypt, some device mapper options, overlayfs) I attached the minimal config (minimal.config) and the defconfig for reference (minimal.defconfig). The setup is really simple now, the gateway is forwarding HTTP connections between eth1(IPSec tunnels) and eth0 without any firewall, NAT, whatsoever.
Thanks a lot for your debugging effort!
The only thing I can think of are the rather aggressive roadwarrior clients. There are 750 roadwarriors that are controlled by a script which starts and stops the IPSec connection.
I still can't reproduce it with my tests. This is probably some race triggered due to your aggressive roadwarrior setup which I don't have.
I tried 4.15-rc8 and have the same problem here (see attached kernel-4.15-rc8.log). SMP affinity for IRQs has changed in 4.15 and something's
There is one patch that could influence this which is not in v4.15-rc8:
commit 76a4201191814a0061cb5c861fafb9ecaa764846
("xfrm: Fix a race in the xdst pcpu cache.")
It is included in v4.15-rc9.
If this does not fix your problem, I'm out of ideas. In this case
I have to ask to do a bisection to find the offending commit.