Thread (31 messages) 31 messages, 6 authors, 2018-09-06

Re: BUG: 4.14.11 unable to handle kernel NULL pointer dereference in xfrm_lookup

From: Tobias Hommel <hidden>
Date: 2018-01-10 07:42:20

On Tue, Jan 09, 2018 at 03:49:21PM +0100, Tobias Hommel wrote:
On Tue, Jan 09, 2018 at 10:26:24AM +0100, Steffen Klassert wrote:
quoted
On Tue, Jan 09, 2018 at 10:06:51AM +0100, Tobias Hommel wrote:
quoted
quoted
You have CONFIG_INET_ESP_OFFLOAD enabled, this is new maybe it
still has some problems. You should not hit an offload codepath
because all your SAs are configured with UDP encapsulation which
is still not supported with offload.
I ran some new tests with 4.14.12. This time I removed encap=yes from the
strongswan config so I have plain ESP tunnels, without UDP encapsulation. Just
to be sure. It still crashes, the attached panic.noencap.log is pretty much
the same as the logs before.
quoted
quoted
quoted
Please try to disable GRO on both interfaces and see what happens:

ethtool -K eth0 gro off
ethtool -K eth1 gro off
I actually already tried that with only eth1 off, to verify I turned offloading
off for both interfaces. The same problem: see attached panic.gro_off.log
quoted
Then disable CONFIG_INET_ESP_OFFLOAD and try again.
Rebuild with CONFIG_INET_ESP_OFFLOAD disabled, same problem: see attached
panic.esp_offload_disabled.log
So ESP offload is not the problem. Next thing that comes to my mind
is the flowcache removal, this was introduced with v4.14.
quoted
quoted
This should show us if this feature is responsible for the bug.
I will try narrowing down the problem by trying out some older kernels for now.
Thanks!

Let me know about the results.
I copied the config from my 4.14.12 sources to a fresh 4.13.16 source tree, ran
`make olddefconfig` and built a new kernel.
The kernel config is attached as kernel-4.13.16.config.
The panic*.log files are kernel logs from different crashes of this 4.13.16
kernel, but all from the same scenario as before.
I also enabled CONFIG_DEBUG_INFO, so if any disassemblies are required, I'd be
happy to provide them.

So, the system still crashes, but the traces are completely different from
those with 4.14.12. This time there are also WARNINGs and "refcnt: -1" messages
sometimes before the actual panic, so not sure if there is maybe some other
problem. Still, the crashes all seem to be related to ip routing somehow.

Attachments

Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help