Thread (25 messages) 25 messages, 2 authors, 2014-06-02

Re: Help debugging iwldvm / ath10k stalls

From: Andy Lutomirski <luto@amacapital.net>
Date: 2014-06-02 18:46:05

On Mon, Jun 2, 2014 at 11:40 AM, Emmanuel Grumbach [off-list ref] wrote:

On 06/02/2014 09:11 PM, Andy Lutomirski wrote:
quoted
On Mon, Jun 2, 2014 at 9:54 AM, Andy Lutomirski [off-list ref] wrote:
quoted
On Wed, May 28, 2014 at 5:09 AM, Emmanuel Grumbach [off-list ref] wrote:
quoted
quoted
I doubt I can bisect -- the trigger was a new AP, not a new kernel.  I
can't exactly cut the AP in half :)
I see.. This is really weird though. Anyway.
quoted

Pre-suspend, i.e., working:

[   20.949900] enabled = 1, wowlan = 0
[   20.950177] enabled = 1, wowlan = 0
[   21.614016] enabled = 1, wowlan = 0
[   21.614658] enabled = 1, wowlan = 0
[   42.667586] enabled = 0, wowlan = 0
[   42.672514] enabled = 1, wowlan = 0
[   53.088165] fuse init (API version 7.23)
[   53.102082] SELinux: initialized (dev fuse, type fuse), uses genfs_contexts
[   53.130945] SELinux: initialized (dev fusectl, type fusectl), uses
genfs_contexts
[   85.627558] enabled = 0, wowlan = 0
[   85.631686] enabled = 1, wowlan = 0
[  134.649346] e1000e: em1 NIC Link is Down
[  137.682277] wlan0: deauthenticating from 02:c6:26:cc:b4:c7 by local
choice (Reason: 3=DEAUTH_LEAVING)
[  137.682780] enabled = 0, wowlan = 0
[  137.693889] enabled = 0, wowlan = 0

Post-suspend, i.e., not working:

[  144.406303] enabled = 1, wowlan = 0
[  144.406496] enabled = 1, wowlan = 0
[  145.026827] enabled = 1, wowlan = 0
[  145.028211] enabled = 1, wowlan = 0
[  165.688632] enabled = 0, wowlan = 0
[  165.689960] enabled = 0, wowlan = 0
[  165.693988] enabled = 1, wowlan = 0
[  165.694245] enabled = 1, wowlan = 0
[  208.641426] enabled = 0, wowlan = 0
[  208.641786] enabled = 0, wowlan = 0
[  208.647499] enabled = 1, wowlan = 0
[  208.647639] enabled = 1, wowlan = 0
[  271.435558] enabled = 0, wowlan = 0
[  271.435767] enabled = 0, wowlan = 0
[  271.440125] enabled = 1, wowlan = 0
[  271.440405] enabled = 1, wowlan = 0

With even more instrumentation added, I did get a glitch before
suspend/resume, but it came with more than two power setting updates.
Logs and patch attached, complete with call stacks.
I don't see any callstacks?
Doesn't matter though.
I think the callstacks were in the attachment.  I could have messed up, though.

Anyway, I don't buy the theory that this is caused by the firmware
going out to lunch.  The queues files in debugfs show the rx queue
chugging along and all of the tx queues have read_ptr == write_ptr.
Wireshark shows incoming broadcast traffic, too.  I'd guess that the
problem is more likely to be that the card is failing to wake up and
notice pending data in the TIM.
Well... I might have been unclear here (I never know how much detail I should share with the recipient :)).
From your log it appears that the NIC is in power save. So we can't increment the write pointer of the Tx ring (add a packet for transmission). So we simply remember that we need to do so (increment the write pointer) and request a wakeup so that we will update the write pointer in the wakeup interrupt... which doesn't happen.
No power save - no need for wakeup interrupt.
I'm still unconvinced.  One of the tx queues actually has a both
read_ptr and write_ptr incrementing once or twice per second even when
I can't ping the gateway.  Can you point me at the right code or log
stuff to look at?
quoted
OTOH, with iwlwifi.11n_disable=4 (no rx A-MPDU), I seem to be doing
pretty well.  I'll test a stock kernel configured like that for the
next few days.
That's interesting...
--Andy
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help