Re: [Ipw2100-devel] ipw2100: race between isr_indicate_associated and rx path
From: Helmut Schaa <hidden>
Date: 2009-02-23 10:39:16
Am Donnerstag, 5. Februar 2009 schrieb Helmut Schaa:
Am Dienstag, 27. Januar 2009 schrieb Helmut Schaa:quoted
Am Freitag, 23. Januar 2009 schrieb Helmut Schaa:quoted
Am Freitag, 23. Januar 2009 schrieb Zhu, Yi:
[...]
quoted
quoted
quoted
I see. This should be a firmware bug. I think your idea to queue packets between ASSOCIATING and ASSOCIATED and replay them later (state becomes ASSOCIATED) should work.Agreed, I'll try that (maybe today, maybe next week).Ok, I've done a first try and the frame buffering/replaying works quite well but I've ran into another issue now: The supplicant successfully receives the EAP frame which was buffered by the driver and sends the appropriate resone. However the response is not send over the air. If I just add a sleep(1) before sending the frame in the supplicant all works well. I have no clue yet why the frame is not send.JFYI, got a bit further now. The driver never got the frame from the supplicant. It's the netdev which does not accept the frame that short after the queues are woken up.
Found some time again to investigate this issue again. The current state
is as follows:
After the firmware notifies the driver about the association it starts
buffering all frames. Once the delayed work is executed and moves the
driver state to ASSOCIATED the following happens:
1) netif_carrier_on
2) netif_wake_queue
3) wireless_send_event
4) replay buffered frames
Hereupon wpa_supplicant receives the buffered EAP-frame and builds the
according reply and tries to send it. The sendto call does _not_ indicate
an error. Nevertheless, the frame is not passed to the ipw2100 driver. I
was able to track that down to the following situation:
This happens when the driver moves to the associated state:
----------------------------
netif_carrier_on
linkwatch_fire_event
linkwatch_schedule_work
netif_wake_queue
----------------------------
At that point in time the device's tx queue has a noop_qdisc assigned.
Now wpa_supplicant sends the EAP reply:
---------------------------
packet_sendmsg
dev_queue_xmit
qdisc_enqueue_root
qdisc_enqueue
return NET_XMIT_CN
return 0
---------------------------
Since the qdisc is still noop_qdisc, qdisc_enqueue returns NET_XMIT_CN for
every frame while packet_sendmsg translates that to 0, see netdevice.h:
#define net_xmit_errno(e) ((e) != NET_XMIT_CN ? -ENOBUFS : 0)
Hence, wpa_supplicant thinks the frame was sent out successfully.
Somewhat later when the queued linkwatch work is executed the qdisc gets
swapped to the default_qdisc which would allow frames to be send.
---------------------------
linkwatch_event
__linkwatch_run_queue
activate_dev
attach_default_qdisc
---------------------------
So, how should I proceed here?
Some possibilities that come to mind:
1) let the noop_qdisc return NET_XMIT_DROP instead of NET_XMIT_CN and extend
wpa_supplicant to retry after a short timeout. Already tried this approach
and it works fine for me. wpa_supplicant typically needs one retry (200ms
delay) until the frame is successfully send out.
2) Run activate_dev somehow without a delay. I guess this could be achieved by
changing linkwatch_urgent_event. I haven't tested this yet. But I guess we
would still have a small race here.
3) Wait until activate_dev was called in ipw2100 before replaying the cached
frames.
Maybe, someone from the netdev people can give me a hand here?
Jouni, would you accept a patch for wpa_supplicant that adds some retries
to l2_packet_send when the network stack returns an error?
Thanks,
Helmut