Thread (6 messages) 6 messages, 3 authors, 2016-12-05

Re: [PATCH] mac80211: fix Tx BA session stuck issue during sw scanning

From: Chris Chiu <hidden>
Date: 2016-12-01 11:32:31

On Tue, Nov 29, 2016 at 8:59 PM, Johannes Berg
[off-list ref] wrote:
On Wed, 2016-11-23 at 15:59 +0800, Chris Chiu wrote:
quoted
ieee80211_iface_work() will check if sw scanning is in progress
before handling block ack session. In our case, the RTL8821AE
operate in station mode, when tx session expired, DELBA packet
stuck during sw scanning and so do other data packets.

ieee80211_scan_state_decision() will take lots of time in
SCAN_SUSPEND/SCAN_RESUME state due to !tx_empty or bad_latency.
Then the sw scanning mostly take > 20 seconds to finish or even
worse in our case RTL8821AE have 37 channels for 2G+5G to scan
and tx stalls ~300 seconds. AP side still thinks the connection
is alive because it still receives the QoS NULL packet from STA.
So the link state will never change but actually no data tx/rx
during this long time.

This commit tries to send out packet in SCAN_SUSPEND state so the
sw scanning can complete more efficiently and less affect on Block
Ack session handling. Verified on RTL8821AE for > 30000 pings and
no Tx BA session stuck observed.
The premise seems fairly reasonable, although I'm a little worried that
if so much new traffic is coming in we never finish the scan suspend?
Actually, the queues are still stopped, so it's only management frames
that can come in, so that should be ok?
Actually it will finish scan eventually and back to SCAN_DECISION
state but almost 20~30 seconds elapsed. The local->scanning should be
cleared after all channels been scanned. However, from the debug
messages I added in ieee80211_iface_work(), it still returns when
check local->scanning and the DELBA still has no chance to be
transferred then stuck again at the next scan state machine. Supposed
to be another scan request issued but I don't know who's the killer.
Except to find who issue the next scan request, BA session have no
chance to reset in this long scan period (>20s) still need to be taken
care.
quoted
+     test_and_clear_bit(SCAN_SUSPEND_SCANNING, &local->scanning);
That makes no sense, you're not checking the return value, just clear
the bit without test.
quoted
@@ -844,6 +846,8 @@ static void ieee80211_scan_state_suspend(struct
ieee80211_local *local,
      /* disable PS */
      ieee80211_offchannel_return(local);

+     __set_bit(SCAN_SUSPEND_SCANNING, &local->scanning);
Why are you using the non-atomic version here, vs. the atomic one
above?
You're right. I just want to clear_bit and set_bit in this case, sorry
for that confusing. Or any better suggestion?
johannes
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help