I am testing the roaming of a wireless client between 2 accesspoints.
I use the madwifi-0.9.3 driver, wpa_supplicant-0.5.7 and Linux kernel 2.6.16.
I enabled automatic roaming in madwifi like suggested in the madwifi-devel thread "automatic roaming" (2007-03-20) and in #697.
The roaming from AP-1 to AP-2 or vice versa always takes several seconds. This is caused by wpa_supplicant needing to retry authentication due to transmit data being dropped for almost exactly 1 second after association with the new AP.
I simplified the test scenario by revert to WEP encryption and even open mode without WEP and used ping -i 0.2 <ipaddr> to track down transmission data loss. I observed 5 pings getting lost when roaming from AP-1 to AP-2 or vice versa.
In the madwifi-devel list other people have also noticed the 1 second drop of transmit data:
- 2007-03-20 15:57 Re: automatic roaming
- 2007-04-05 09:58 1000ms delay after roaming
Finally I revealed that the problem is caused by the madwifi driver and the
Linux kernel (2.6.16) in cooperation:
- Madwifi does netif_carrier_off() in ieee80211_notify_node_leave() when it leaves AP-1 before it roams to AP-2.
- After association with AP-2 madwifi does netif_carrier_on() in ieee80211_notify_node_join(), usually only some milliseconds after netif_carrier_off().
netif_carrier_off() and netif_carrier_on() both call linkwatch_fire_event() in linux-source-2.6.16/net/core/link_watch.c.
- In link_watch.c the rate of linkwatch events is limited to one per second to prevent a storm of messages on the netlink socket.
- Due to this limitation the reactivation of the wireless interface intended by netif_carrier_on() is delayed by 1 second and messages sent during that time are discarded, although messages received by madwifi are passed to wpa_supplicant.
Furthermore the wireless event about lost association with AP-1 causes needless trouble in wpa_supplicant.
IMHO madwifi roaming to a new AP should not ieee80211_notify_node_leave() about lost association with the old AP if association with the new AP succeeds within milliseconds.
The attached patch achieves that:
ieee80211_notify_node_leave() delays notify leave old AP hoping that
ieee80211_notify_node_join() cancels the delay timer on association to the new AP. If the latter does not succeed within 100 milliseconds the notification is still sent to inform kernel and application.