Re: Help debugging iwldvm / ath10k stalls
From: Andy Lutomirski <luto@amacapital.net>
Date: 2014-06-02 18:46:05
On Mon, Jun 2, 2014 at 11:40 AM, Emmanuel Grumbach [off-list ref] wrote:
On 06/02/2014 09:11 PM, Andy Lutomirski wrote:quoted
On Mon, Jun 2, 2014 at 9:54 AM, Andy Lutomirski [off-list ref] wrote:quoted
On Wed, May 28, 2014 at 5:09 AM, Emmanuel Grumbach [off-list ref] wrote:quoted
quoted
I doubt I can bisect -- the trigger was a new AP, not a new kernel. I can't exactly cut the AP in half :)I see.. This is really weird though. Anyway.quoted
Pre-suspend, i.e., working: [ 20.949900] enabled = 1, wowlan = 0 [ 20.950177] enabled = 1, wowlan = 0 [ 21.614016] enabled = 1, wowlan = 0 [ 21.614658] enabled = 1, wowlan = 0 [ 42.667586] enabled = 0, wowlan = 0 [ 42.672514] enabled = 1, wowlan = 0 [ 53.088165] fuse init (API version 7.23) [ 53.102082] SELinux: initialized (dev fuse, type fuse), uses genfs_contexts [ 53.130945] SELinux: initialized (dev fusectl, type fusectl), uses genfs_contexts [ 85.627558] enabled = 0, wowlan = 0 [ 85.631686] enabled = 1, wowlan = 0 [ 134.649346] e1000e: em1 NIC Link is Down [ 137.682277] wlan0: deauthenticating from 02:c6:26:cc:b4:c7 by local choice (Reason: 3=DEAUTH_LEAVING) [ 137.682780] enabled = 0, wowlan = 0 [ 137.693889] enabled = 0, wowlan = 0 Post-suspend, i.e., not working: [ 144.406303] enabled = 1, wowlan = 0 [ 144.406496] enabled = 1, wowlan = 0 [ 145.026827] enabled = 1, wowlan = 0 [ 145.028211] enabled = 1, wowlan = 0 [ 165.688632] enabled = 0, wowlan = 0 [ 165.689960] enabled = 0, wowlan = 0 [ 165.693988] enabled = 1, wowlan = 0 [ 165.694245] enabled = 1, wowlan = 0 [ 208.641426] enabled = 0, wowlan = 0 [ 208.641786] enabled = 0, wowlan = 0 [ 208.647499] enabled = 1, wowlan = 0 [ 208.647639] enabled = 1, wowlan = 0 [ 271.435558] enabled = 0, wowlan = 0 [ 271.435767] enabled = 0, wowlan = 0 [ 271.440125] enabled = 1, wowlan = 0 [ 271.440405] enabled = 1, wowlan = 0 With even more instrumentation added, I did get a glitch before suspend/resume, but it came with more than two power setting updates. Logs and patch attached, complete with call stacks.I don't see any callstacks? Doesn't matter though.I think the callstacks were in the attachment. I could have messed up, though. Anyway, I don't buy the theory that this is caused by the firmware going out to lunch. The queues files in debugfs show the rx queue chugging along and all of the tx queues have read_ptr == write_ptr. Wireshark shows incoming broadcast traffic, too. I'd guess that the problem is more likely to be that the card is failing to wake up and notice pending data in the TIM.Well... I might have been unclear here (I never know how much detail I should share with the recipient :)). From your log it appears that the NIC is in power save. So we can't increment the write pointer of the Tx ring (add a packet for transmission). So we simply remember that we need to do so (increment the write pointer) and request a wakeup so that we will update the write pointer in the wakeup interrupt... which doesn't happen. No power save - no need for wakeup interrupt.
I'm still unconvinced. One of the tx queues actually has a both read_ptr and write_ptr incrementing once or twice per second even when I can't ping the gateway. Can you point me at the right code or log stuff to look at?
quoted
OTOH, with iwlwifi.11n_disable=4 (no rx A-MPDU), I seem to be doing pretty well. I'll test a stock kernel configured like that for the next few days.That's interesting...
--Andy