On 10 February 2015 at 14:14, Eric Dumazet [off-list ref] wrote:
On Tue, 2015-02-10 at 11:33 +0100, Michal Kazior wrote:
quoted
ath10k_core_napi_dummy_poll, 64);
+ ewma_init(&ar->tx_delay_us, 16384, 8);
1) 16384 factor might be too big.
2) a weight of 8 seems too low given aggregation values used in wifi.
On 32bit arches, the max range for ewma value would be 262144 usec,
a quarter of a second...
You could use a factor of 64 instead, and a weight of 16.
64/16 seems to work fine as well.
On a related note: I still wonder how to get single TCP flow to reach
line rate with ath10k (it still doesn't; I reach line rate with
multiple flows only). Isn't the tcp_limit_output_bytes just too small
for devices like Wi-Fi where you can send aggregates of even 64*3*1500
bytes long in a single shot and you can't expect even a single
tx-completion of it to come in before its transmitted entirely? You
effectively operate with bursts of traffic.
Some numbers:
ath10k w/o cushion w/o aggregation 1 flow: UDP 65mbps, TCP 30mbps
ath10k w/ cushion w/o aggregation 1 flow: UDP 65mbps, TCP 59mbps
ath10k w/o cushion w/ aggregation 1 flow: UDP 650mbps, TCP 250mbps
ath10k w/ cushion w/ aggregation 1 flow: UDP 650mbps, TCP 250mbps
ath10k w/o cushion w/ aggregation 5 flows: UDP 650mbps, TCP 250mbps
ath10k w/ cushion w/ aggregation 5 flows: UDP 650mbps, TCP 600mbps
"w/o aggregation" means forcing ath10k to use 1 A-MSDU and 1 A-MPDU
per aggregate so latencies due to aggregation itself should be pretty
much nil.
If I set tcp_limit_output_bytes to 700K+ I can get ath10k w/ cushion
w/ aggregation to reach 600mbps on a single flow.
Michał