Re: Commit 05cf0d1bf4 ("net: stmmac: free an skb first when there are no longer any descriptors using it") breaks stmmac?
From: Niklas Cassel <hidden>
Date: 2017-11-30 03:51:57
Subsystem:
networking drivers, stmmac ethernet driver, the rest · Maintainers:
Andrew Lunn, "David S. Miller", Eric Dumazet, Jakub Kicinski, Paolo Abeni, Linus Torvalds
On Mon, Nov 27, 2017 at 02:41:00PM +0000, Jose Abreu wrote:
Hi Niklas,
Hello Jose,
I think your commit 05cf0d1bf4 ("net: stmmac: free an skb first
when there are no longer any descriptors using it") is breaking
stmmac driver in multi-queue configuration (this stacktrace may
contain some extra characters as I was using serial port):
------------->8-------------
general protection fault: 0000 [#1] SMP
Modules linked in: stmmac_pci stmmac libphy igb ptp pps_core
x86_pkg_temp_thermal
CPU: 5 PID: 0 Comm: swapper/5 Tainted: G W 4.14.0-rc5 #2
Hardware name: Default string Default string/SKYBAY, BIOS 5.0.1.1
10/06/2016
task: ffffa2fe14d8b100 task.stack: ffffb8c6000b8000
RIP: 0010:skb_release_data+0x66/0x110
RSP: 0018:ffffa2fe2dd43d98 EFLAGS: 00010206
RAX: 0000000000000030 RBX: ffffa2fe13fab100 RCX: 00000000000005aa
RDX: ffffa2fe12a50000 RSI: 0000000000000000 RDI: fffcfffdfffbfffc
RBP: ffffa2fe2dd43db0 R08: ffffa2fe2dfcd000 R09: 0000000000000001
R10: ffffffffa06245d0 R11: ffffa2fe14c03700 R12: 0000000000000000
R13: ffffa2fe11e686c0 R14: ffffa2fe13fab100 R15: ffffa2fe129b8940
FS: 0000000000000000(0000) GS:ffffa2fe2dd40000(0000)
knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fe26c457000 CR3: 000000002b609003 CR4: 00000000003606e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<IRQ>
skb_release_all+0x1f/0x30
consume_skb+0x1d/0x40
__dev_kfree_skb_any+0x2a/0x30
stmmac_tx_clean+0x230/0x4d0 [stmmac]
stmmac_poll+0x4b/0x980 [stmmac]
net_rx_action+0x1ad/0x290
__do_softirq+0xdd/0x1d6
irq_exit+0x77/0x80
do_IRQ+0x4a/0xc0
common_interrupt+0x93/0x93
</IRQ>
RIP: 0010:cpuidle_enter_state+0x16a/0x210
RSP: 0018:ffffb8c6000bbe90 EFLAGS: 00000286 ORIG_RAX:
ffffffffffffffae
RAX: ffffa2fe2dd575c0 RBX: ffffa2fe14560200 RCX: 000000000000001f
RDX: 0000000000000000 RSI: 00000122dbd7ae06 RDI: 0000000000000000
RBP: ffffb8c6000bbec0 R08: 0000000000000020 R09: 0000000000000002
R10: ffffb8c6000bbe60 R11: 0000000000123400 R12: 0000004f3c11b47a
R13: 0000000000000003 R14: ffffffffa0e3aa58 R15: 0000000000000003
cpuidle_enter+0x12/0x20
call_cpuidle+0x1e/0x40
do_idle+0x16a/0x1c0
cpu_startup_entry+0x18/0x20
start_secondary+0x10d/0x110
secondary_startup_64+0xa5/0xa5
------------->8-------------
Using tree with your commit I get this stacktrace upon streaming
data at random time (stacktrace does not appear everytime),
without the commit I get no errors.
I understand the reason for your commit but we already have
dirty_tx in stmmac_tx_clean(), shouldn't it be enough to prevent
skb use-after-free? Can you please review your patch to make sure
nothing else is missing?
Dirty is not enough.
The problem that I thought I solved,
was if an single skb was spread out over
3 tx descs like this:
TX DESC #1: first descriptor bit set
TX DESC #2:
TX DESC #3: last descriptor bit set
before my patch the
tx_q->tx_skbuff[first_entry] would have the skb pointer.
let's say that curr_tx is pointing to TX DESC4,
since curr_tx is supposed to point to the place where
we are going to start filling next time.
The problem I thought I solved was if stmmac_tx_clean
came and started cleaning, it could clean TX DESC #1,
and thus freeing the skb (which TX DESC #2 and TX DESC #3 uses.)
However, by reading the databook more carefully:
If an Ethernet packet is stored over data buffers in multiple
descriptors, the DMA will fetch all descriptors, and
will only update the own bit in all descriptors,
after the complete packet has been sent.
Because of this, I think that the case where
TX DESC #1 gets its own bit cleared, while the hardware
hasn't yet fetched the TX DESC #2 and TX DESC #3 cannot
happen, so my patch 05cf0d1bf4 ("net: stmmac: free an skb first
when there are no longer any descriptors using it")
has no real purpose, and could be reverted.
However, I don't see why is should cause your
kernel to crash.
It looks like skb_release_data is trying to access a member in a pointer
that is null, i.e. a double free of an skb.
I assume this is because the same skb address exists in several queues.
My guess is that the problem is that tx_q->tx_skbuff[] does not get set to
NULL properly when having multiple queues. (It obviously works for a single
tx queue).
stmmac_tx_clean() will do:
if (skb)
tx_q->tx_skbuff[entry] = NULL;
so skbs should always be set to NULL.
However, one difference seems to be that stmmac_xmit and stmmac_tso_xmit
inside the nfrags for-loop, seems to set the skb pointer to NULL.
for (i = 0; i < nfrags; i++) {
tx_q->tx_skbuff[entry] = NULL;
Considering this is already done in stmmac_tx_clean, this seems redundant.
But one difference is that before my patch,
all frags, except first TX DESC, would get NULL:ed
(first desc would be assigned the skb).
After my patch, all descs, except first TX DESC,
would get NULL:ed, last desc would get assigned
the skb, but first desc is not explicitly cleared.
Could you try to see if the following patch
solves your problem:
(still don't see why it should be needed,
considering stmmac_tx_clean() should set all non-NULL
entries to NULL)
diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
index 1763e48c84e2..1d30b20b3740 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c@@ -2830,6 +2830,7 @@ static netdev_tx_t stmmac_tso_xmit(struct sk_buff *skb, struct net_device *dev) tx_q->tx_skbuff_dma[first_entry].buf = des; tx_q->tx_skbuff_dma[first_entry].len = skb_headlen(skb); + tx_q->tx_skbuff[first_entry] = NULL; first->des0 = cpu_to_le32(des);
@@ -3003,6 +3004,8 @@ static netdev_tx_t stmmac_xmit(struct sk_buff *skb, struct net_device *dev) first = desc; + tx_q->tx_skbuff[first_entry] = NULL; + enh_desc = priv->plat->enh_desc; /* To program the descriptors according to the size of the frame */ if (enh_desc)