Thread (31 messages) 31 messages, 6 authors, 2018-09-06

Re: BUG: 4.14.11 unable to handle kernel NULL pointer dereference in xfrm_lookup

From: Tobias Hommel <hidden>
Date: 2018-01-29 08:38:19

On Wed, Jan 24, 2018 at 10:59:21AM +0100, Steffen Klassert wrote:
On Fri, Jan 19, 2018 at 03:45:46PM +0100, Tobias Hommel wrote:
quoted
I tried to strip down the system configuration and was able to reproduce the
problem with a minimal configuration:
* ipsets are not used anymore
* no firewall markings are used any longer
* iptables are "completely empty", i.e. all policies set to ACCEPT and there is
  no rule in any table
* no additional routing policies (ip rule) except the default ones
* only main routing table is used
* using a "minimal" kernel config:
 * run `make defconfig`
 * add basic things (ESP, IGB driver, some crypto algorithms)
 * add options required to boot up the system (TPM crypt, some device mapper
   options, overlayfs)

I attached the minimal config (minimal.config) and the defconfig for reference
(minimal.defconfig).

The setup is really simple now, the gateway is forwarding HTTP connections
between eth1(IPSec tunnels) and eth0 without any firewall, NAT, whatsoever.
Thanks a lot for your debugging effort!
quoted
The only thing I can think of are the rather aggressive roadwarrior clients.
There are 750 roadwarriors that are controlled by a script which starts and
stops the IPSec connection.
I still can't reproduce it with my tests. This is probably some race
triggered due to your aggressive roadwarrior setup which I don't have.
quoted
I tried 4.15-rc8 and have the same problem here (see attached
kernel-4.15-rc8.log). SMP affinity for IRQs has changed in 4.15 and something's
There is one patch that could influence this which is not in v4.15-rc8:

commit 76a4201191814a0061cb5c861fafb9ecaa764846
("xfrm: Fix a race in the xdst pcpu cache.")

It is included in v4.15-rc9.
I already tested that one some weeks ago, when it appeared on the mailing list,
with 4.14. Without any luck.
If this does not fix your problem, I'm out of ideas. In this case
I have to ask to do a bisection to find the offending commit.
I'll do a bisect session then. It'll take some time though as the hardware is
currently occupied with other tests. I'll keep you up-to-date about the
results.
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help