Thread (6 messages) 6 messages, 3 authors, 2020-12-11

Re: [RFT] ath9k: multi-rate-retry fails at HW level

From: Toke Høiland-Jørgensen <hidden>
Date: 2020-12-01 13:35:11

Zefir Kurtisi [off-list ref] writes:
CC += adrian

On 24.11.20 15:45, Toke Høiland-Jørgensen wrote:
quoted
Zefir Kurtisi [off-list ref] writes:
quoted
Hi,

I am running into a strange issue with the ath9k operating a 9590
device which to me seems like a HW issue, but since work on rate
controllers is already going for decades, I hardly can imagine this
never showed up.

The issue observed is this: the TX status descriptors never report
rateindex 1, it is always 0, 2, or 3, but never 1.

I noticed this by overwriting the rate configuration provided by
minstrel to a static setup, e.g. (7,3)(5,3)(3,3)(1,3), all MCS. The
device operates as iperf client to a connected AP and continuously
transmits data. While at that, the attenuation between the endpoints
is gradually increased, expecting to see a gradual shift in the
reported TX status rateindex from 0 to 3. But nada, the values
reported are 0,2, and 3 - never 1.

I double checked that the TX descriptors are correctly set with the
rates and retry counts - all looking sane.

More obvious, after changing the rate configuration to
(7,3)(1,3)(5,3)(3,3) the expectation would be to have either 0 or 1
reported as rateidx, since the transmission ought to be successful
with the lowest rate or never. Again all rates are reported but 1.

Now the question for me is: what is the HW exactly doing with such a
configuration? Is it skipping the second rate, or is it just reporting
wrong?
You should be able to see this by looking at the rates the frames are
being sent at, shouldn't you?
Yes, did that and from there it points to that the second rate is just skipped.

Here are some use cases and their sniffing results. Setup is a 11ng STA connected
to AP with the attenuation adjusted such that MCS 7 fails, while MCS 5 and below
succeed. Monitor is sniffing while sending a single ping from AP to STA.

With a rate configuration of (7/2)(3/2)(1/2) we get:
14:02:42.923880 9481489761us tsft 2412 MHz 11n -68dBm signal 65.0 Mb/s MCS 7 20
MHz long GI RX-STBC0 -68dBm signal antenna 0 Data IV:  e Pad 20 KeyID 0
14:02:42.923909 9481490037us tsft 2412 MHz 11n -69dBm signal 65.0 Mb/s MCS 7 20
MHz long GI RX-STBC0 -69dBm signal antenna 0 Data IV:  e Pad 20 KeyID 0
14:02:42.925244 9481491044us tsft 2412 MHz 11n -68dBm signal 13.0 Mb/s MCS 1 20
MHz long GI RX-STBC0 -68dBm signal antenna 0 Data IV:  e Pad 20 KeyID 0


with (7/2)(1/2)(3/2):
13:59:37.073147 9295637087us tsft 2412 MHz 11n -69dBm signal 65.0 Mb/s MCS 7 20
MHz long GI RX-STBC0 -69dBm signal antenna 0 Data IV:  c Pad 20 KeyID 0
13:59:37.073467 9295637438us tsft 2412 MHz 11n -69dBm signal 65.0 Mb/s MCS 7 20
MHz long GI RX-STBC0 -69dBm signal antenna 0 Data IV:  c Pad 20 KeyID 0
13:59:37.074591 9295638498us tsft 2412 MHz 11n -68dBm signal 26.0 Mb/s MCS 3 20
MHz long GI RX-STBC0 -68dBm signal antenna 0 Data IV:  c Pad 20 KeyID 0

and with (7/2)(3/2):
14:04:27.269806 9585836783us tsft 2412 MHz 11n -69dBm signal 65.0 Mb/s MCS 7 20
MHz long GI RX-STBC0 -69dBm signal antenna 0 Data IV: 10 Pad 20 KeyID 0
14:04:27.270342 9585837344us tsft 2412 MHz 11n -68dBm signal 65.0 Mb/s MCS 7 20
MHz long GI RX-STBC0 -68dBm signal antenna 0 Data IV: 10 Pad 20 KeyID 0
14:04:27.271368 9585838370us tsft 2412 MHz 11n -68dBm signal 65.0 Mb/s MCS 7 20
MHz long GI RX-STBC0 -68dBm signal antenna 0 Data IV: 10 Pad 20 KeyID 0
[..]

a total of 14 attempts at MCS 7 with the ping finally failing.
quoted
quoted
Both possibilities have great impact, since upper layers (like
airtime) use the returned rateidx to calculate and configure operating
parameters at runtime.
Have you actually observed any issues from this? If it's just skipping a
rate, minstrel should still be able to make decisions based on the
actual values returned, no?
The issues arise from the fact that the driver reports a
(tx-rateindex/tx-attemp-index) per TX descriptor, leaving the driver to calculate
what was put on air based on these two values. If one had rates set to
(7/2)(3/7)(1/2) and the TX status reports (tx-rateindex=2/tx-attempt-index=0),
driver assumes there were 10 attempts in total while in fact they were 3 when the
second rate is skipped. What direct effect this has on RC I can't grasp, but it
definitively falsifies statistics.

Same goes for airtime: check how this falsifies its calculation in
ath_tx_count_airtime().
Ah, right, I was assuming that rates[1].count would be reset to zero
somehow. Have you confirmed that the attempts actually go up on in the
Minstrel stats for the skipped rate?
Also, the above mentioned is an immediate visible issue: if RC
provides two rates e.g. (7/3)(5/3) of which the first is too high and
the second is not even attempted, frames don't make it through.
Yeah, rate control would likely take longer to converge to the right
rate. I suppose if this is a hardware model-specific issue that a quirks
bit could be added to instruct Minstrel to disregard the second index.
But it does sound a bit odd; have you verified that it's consistent on
different units of the same model (and not just a busted device)?

-Toke
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help