Thread (4 messages) 4 messages, 3 authors, 2021-05-24

Re: [BUG] net: stmmac: Panic observed in stmmac_napi_poll_rx()

From: Mikko Perttunen <hidden>
Date: 2021-05-24 12:49:04
Also in: linux-tegra

On 5/17/21 1:39 PM, Jon Hunter wrote:
On 14/05/2021 22:49, Michał Mirosław wrote:
quoted
On Fri, May 14, 2021 at 03:24:58PM +0100, Jon Hunter wrote:
quoted
Hello!

I have been looking into some random crashes that appear to stem from
the stmmac_napi_poll_rx() function. There are two different panics I
have observed which are ...
[...]
quoted
The bug being triggered in skbuff.h is the following ...

  void *skb_pull(struct sk_buff *skb, unsigned int len);
  static inline void *__skb_pull(struct sk_buff *skb, unsigned int len)
  {
          skb->len -= len;
          BUG_ON(skb->len < skb->data_len);
          return skb->data += len;
  }

Looking into the above panic triggered in skbuff.h, when this occurs
I have noticed that the value of skb->data_len is unusually large ...

  __skb_pull: len 1500 (14), data_len 4294967274
[...]

The big value looks suspiciously similar to (unsigned)-EINVAL.
Yes it does and at first, I thought it was being set to -EINVAL.
However, from tracing the length variables I can see that this is not
the case.
quoted
quoted
I then added some traces to stmmac_napi_poll_rx() and
stmmac_rx_buf2_len() to trace the values of various various variables
and when the problem occurs I see ...

  stmmac_napi_poll_rx: stmmac_rx: count 0, len 1518, buf1 66, buf2 1452
  stmmac_napi_poll_rx: stmmac_rx_buf2_len: len 66, plen 1518
  stmmac_napi_poll_rx: stmmac_rx: count 1, len 1518, buf1 66, buf2 1452
  stmmac_napi_poll_rx: stmmac_rx_buf2_len: len 66, plen 1536
  stmmac_napi_poll_rx: stmmac_rx: count 2, len 1602, buf1 66, buf2 1536
  stmmac_napi_poll_rx: stmmac_rx_buf2_len: len 1602, plen 1518
  stmmac_napi_poll_rx: stmmac_rx: count 2, len 1518, buf1 0, buf2 4294967212
  stmmac_napi_poll_rx: stmmac_rx: dma_buf_sz 1536, buf1 0, buf2 4294967212
And this one to (unsigned)-EILSEQ.
Yes but this simply comes from 1518-1602 = -84. So it is purely
coincidence.

Jon
I dug around this a little bit. It looks like the issue occurs when we 
get (pardon my terminology, I haven't dealt with networking stuff much) 
a split packet.

What happens is we first process the first frame, growing 'len'. 
buf1_len, I think, hits the "First descriptor, get split header length" 
case and the length is 66. buf2_len hits the rx_not_ls case and the 
length is 1536. In total 1602.

Then the condition 'likely(status & rx_not_ls)' passes and we goto back 
to 'read_again', and read the next frame. Here we eventually get to 
buf2_len again. stmmac_get_rx_frame_len returns 1518 for this frame 
which sounds reasonable, that's what we normally get for non-split 
frames. So what we get is 1518 - 1602 which overflows.

I can dig around a bit more but it would be nice if someone with a bit 
more knowledge of the hardware could comment on the above.

Thanks,
Mikko
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help