Re: [BUG] net: stmmac: Panic observed in stmmac_napi_poll_rx()
From: Mikko Perttunen <hidden>
Date: 2021-05-24 12:49:04
Also in:
linux-tegra
On 5/17/21 1:39 PM, Jon Hunter wrote:
On 14/05/2021 22:49, Michał Mirosław wrote:quoted
On Fri, May 14, 2021 at 03:24:58PM +0100, Jon Hunter wrote:quoted
Hello! I have been looking into some random crashes that appear to stem from the stmmac_napi_poll_rx() function. There are two different panics I have observed which are ...[...]quoted
The bug being triggered in skbuff.h is the following ... void *skb_pull(struct sk_buff *skb, unsigned int len); static inline void *__skb_pull(struct sk_buff *skb, unsigned int len) { skb->len -= len; BUG_ON(skb->len < skb->data_len); return skb->data += len; } Looking into the above panic triggered in skbuff.h, when this occurs I have noticed that the value of skb->data_len is unusually large ... __skb_pull: len 1500 (14), data_len 4294967274[...] The big value looks suspiciously similar to (unsigned)-EINVAL.Yes it does and at first, I thought it was being set to -EINVAL. However, from tracing the length variables I can see that this is not the case.quoted
quoted
I then added some traces to stmmac_napi_poll_rx() and stmmac_rx_buf2_len() to trace the values of various various variables and when the problem occurs I see ... stmmac_napi_poll_rx: stmmac_rx: count 0, len 1518, buf1 66, buf2 1452 stmmac_napi_poll_rx: stmmac_rx_buf2_len: len 66, plen 1518 stmmac_napi_poll_rx: stmmac_rx: count 1, len 1518, buf1 66, buf2 1452 stmmac_napi_poll_rx: stmmac_rx_buf2_len: len 66, plen 1536 stmmac_napi_poll_rx: stmmac_rx: count 2, len 1602, buf1 66, buf2 1536 stmmac_napi_poll_rx: stmmac_rx_buf2_len: len 1602, plen 1518 stmmac_napi_poll_rx: stmmac_rx: count 2, len 1518, buf1 0, buf2 4294967212 stmmac_napi_poll_rx: stmmac_rx: dma_buf_sz 1536, buf1 0, buf2 4294967212And this one to (unsigned)-EILSEQ.Yes but this simply comes from 1518-1602 = -84. So it is purely coincidence. Jon
I dug around this a little bit. It looks like the issue occurs when we get (pardon my terminology, I haven't dealt with networking stuff much) a split packet. What happens is we first process the first frame, growing 'len'. buf1_len, I think, hits the "First descriptor, get split header length" case and the length is 66. buf2_len hits the rx_not_ls case and the length is 1536. In total 1602. Then the condition 'likely(status & rx_not_ls)' passes and we goto back to 'read_again', and read the next frame. Here we eventually get to buf2_len again. stmmac_get_rx_frame_len returns 1518 for this frame which sounds reasonable, that's what we normally get for non-split frames. So what we get is 1518 - 1602 which overflows. I can dig around a bit more but it would be nice if someone with a bit more knowledge of the hardware could comment on the above. Thanks, Mikko