Please note: This project is no longer active. The website is kept online for historic purposes only.
If you´re looking for a Linux driver for your Atheros WLAN device, you should continue here .

Ticket #1998 (new defect)

Opened 13 years ago

Last modified 13 years ago

Kernel panic from madwifi after repeated disassociations

Reported by: keaneyw@gmail.com Assigned to: scottr
Priority: major Milestone:
Component: madwifi: rate module 'minstrel' Version: trunk
Keywords: Cc:
Patch is attached: 0 Pending:

Description (Last modified by mrenzmann)

I have an access point that is using an Atheros AR5006X, and a laptop using an Atheros AR5212 card. Both are running madwifi from SVN - r3717 on the AP, and r3721 on the laptop. The network uses WPA2-Enterprise, with hostapd on the AP and wpa_supplicant on the laptop.

After being associated to the AP for a while, my laptop gets disassociated for no apparent reason. After that, every association is immediately followed by a disassociation. Unloading the modules and restarting the interface on the laptop has no effect - only rebooting seems to help, and that is only temporary.

If I allow the association/disassociation loop to run long enough, the kernel panics. It usually only takes a minute or two for this to occur. My wife's laptop is using an iwl3945 card with wpa_supplicant, and does not exhibit this problem at all.

This situation is highly reproducible - I have experienced as many as 6 hard kernel panics within the space of an hour.

I am attaching the most complete kernel dump I have obtained so far. It is missing some information, so please let me know if what is present is insufficient.

svn info from laptop:

Path: .
URL: http://svn.madwifi.org/madwifi/trunk
Repository Root: http://svn.madwifi.org
Repository UUID: 0192ed92-7a03-0410-a25b-9323aeb14dbd
Revision: 3722
Node Kind: directory
Schedule: normal
Last Changed Author: benoit
Last Changed Rev: 3721
Last Changed Date: 2008-06-12 10:50:36 -0400 (Thu, 12 Jun 2008)

svn info from AP:

Path: .
URL: http://svn.madwifi.org/madwifi/trunk
Repository Root: http://svn.madwifi.org
Repository UUID: 0192ed92-7a03-0410-a25b-9323aeb14dbd
Revision: 3718
Node Kind: directory
Schedule: normal
Last Changed Author: mentor
Last Changed Rev: 3717
Last Changed Date: 2008-06-10 11:32:45 -0400 (Tue, 10 Jun 2008)

Attachments

kerneldump (3.5 kB) - added by keaneyw@gmail.com on 06/15/08 18:21:47.
Kernel Crash Dump
kerneldump.2 (3.9 kB) - added by keaneyw@gmail.com on 06/16/08 14:31:56.
Complete kernel dump
minicom.cap (340.6 kB) - added by keaneyw@gmail.com on 06/18/08 15:26:54.
Minicom capture of syslog during association attempts and panics
kernel.log (174.1 kB) - added by keaneyw@gmail.com on 06/19/08 16:17:48.
Better kernel log for disassociations
minstrel_mrr_no_rates.patch (0.6 kB) - added by scottr on 06/30/08 01:22:41.
Don't try to set up mrr when no rates exist for a node

Change History

06/15/08 18:21:47 changed by keaneyw@gmail.com

  • attachment kerneldump added.

Kernel Crash Dump

06/16/08 14:31:56 changed by keaneyw@gmail.com

  • attachment kerneldump.2 added.

Complete kernel dump

06/16/08 14:34:38 changed by keaneyw@gmail.com

I've attached a better kernel dump - this one is actually complete.
I have begun to think that there may be two distinct bugs here: the disassociation issue, and then what madwifi (or, apparently, minstrel) does when it experiences rapid repeated disassociations.
mentor advised me in IRC to enable kernel lock debugging, which I have done. However, I am not sure how to use that to get more or better information. It was enabled during the latest attached kernel panic, for what it's worth.

06/18/08 15:26:54 changed by keaneyw@gmail.com

  • attachment minicom.cap added.

Minicom capture of syslog during association attempts and panics

06/19/08 04:42:03 changed by mentor

  • priority changed from minor to major.

06/19/08 14:40:59 changed by keaneyw@gmail.com

Further testing results:

Setting iwpriv ath0 bmiss 100 helped reduce the disassociations for a while, but after a couple of hours they returned.
I tried all of the available ratectl options, and only minstrel and sample are actually able to associate with my network.
Setting the ratectl algo on the AP to sample instead of minstrel seems to have helped things.

After running echo 0 > /proc/sys/dev/wifi0/diversity I was able to stay associated for several consecutive hours. Eventually the disassociations returned, and the only remedy was to reboot the computer. This seems to work best when combined with iwpriv ath0 bmiss 100.

The sample ratectl algo does not experience a kernel panic even after an extended period of constant disassociations, so it appears that the panic aspect of this bug is isolated to minstrel.

06/19/08 16:17:48 changed by keaneyw@gmail.com

  • attachment kernel.log added.

Better kernel log for disassociations

06/29/08 18:32:09 changed by mrenzmann

  • description changed.

06/29/08 18:51:49 changed by mrenzmann

  • owner set to scottr.
  • component changed from madwifi: driver to madwifi: rate module 'minstrel'.

Scott, any ideas for this?

06/30/08 01:22:41 changed by scottr

  • attachment minstrel_mrr_no_rates.patch added.

Don't try to set up mrr when no rates exist for a node

06/30/08 01:25:38 changed by scottr

Hi,

Please try the attached patch and let us know how it goes.

Cheers,

07/05/08 18:15:02 changed by keaneyw@gmail.com

Tried the patch w/ the hal-0.10.5.6 branch. I was able to stay associated, using minstrel, for nearly 48 hours before experiencing disassociation problems. After a couple of hours of continuous re/disassociation I have yet to experience a kernel panic. I think it should also be noted, that the disassociation did /not/ coincide with an NMI on the WAP this time.

07/16/08 12:08:39 changed by scottr

I've committed the patch in r3775. I imagine that the disassociation problem is a separate one. Do disassociations happen even when not using minstrel?

Cheers,

Scott.

07/16/08 18:21:00 changed by keaneyw@gmail.com

Yes, the disassociation occurs with both Minstrel and Sample. I'm still trying to determine whether it is caused at the AP side or the client side, or some bizarre combination of the two. I can stay associated with other APs without issue, and other people using non-Atheros cards can stay connected to my AP as well.
I just remembered that I have a PCMCIA atheros card at home. I'll test with that, to rule out vendor-specific issues with my Thinkpad card.