Re: BUG: 4.14.11 unable to handle kernel NULL pointer dereference in xfrm_lookup
From: Tobias Hommel <hidden>
Date: 2018-01-10 07:42:20
On Tue, Jan 09, 2018 at 03:49:21PM +0100, Tobias Hommel wrote:
On Tue, Jan 09, 2018 at 10:26:24AM +0100, Steffen Klassert wrote:quoted
On Tue, Jan 09, 2018 at 10:06:51AM +0100, Tobias Hommel wrote:quoted
quoted
You have CONFIG_INET_ESP_OFFLOAD enabled, this is new maybe it still has some problems. You should not hit an offload codepath because all your SAs are configured with UDP encapsulation which is still not supported with offload.
I ran some new tests with 4.14.12. This time I removed encap=yes from the strongswan config so I have plain ESP tunnels, without UDP encapsulation. Just to be sure. It still crashes, the attached panic.noencap.log is pretty much the same as the logs before.
quoted
quoted
quoted
Please try to disable GRO on both interfaces and see what happens: ethtool -K eth0 gro off ethtool -K eth1 gro offI actually already tried that with only eth1 off, to verify I turned offloading off for both interfaces. The same problem: see attached panic.gro_off.logquoted
Then disable CONFIG_INET_ESP_OFFLOAD and try again.Rebuild with CONFIG_INET_ESP_OFFLOAD disabled, same problem: see attached panic.esp_offload_disabled.logSo ESP offload is not the problem. Next thing that comes to my mind is the flowcache removal, this was introduced with v4.14.quoted
quoted
This should show us if this feature is responsible for the bug.I will try narrowing down the problem by trying out some older kernels for now.Thanks! Let me know about the results.I copied the config from my 4.14.12 sources to a fresh 4.13.16 source tree, ran `make olddefconfig` and built a new kernel. The kernel config is attached as kernel-4.13.16.config. The panic*.log files are kernel logs from different crashes of this 4.13.16 kernel, but all from the same scenario as before. I also enabled CONFIG_DEBUG_INFO, so if any disassemblies are required, I'd be happy to provide them. So, the system still crashes, but the traces are completely different from those with 4.14.12. This time there are also WARNINGs and "refcnt: -1" messages sometimes before the actual panic, so not sure if there is maybe some other problem. Still, the crashes all seem to be related to ip routing somehow.
Attachments
- panic.noencap.log [text/plain] 2703 bytes · preview