Please note: This project is no longer active. The website is kept online for historic purposes only.
If you´re looking for a Linux driver for your Atheros WLAN device, you should continue here .

Ticket #191 (closed defect: fixed)

Opened 14 years ago

Last modified 13 years ago

Two AP ath devices, hangs, sometimes with trace that points to beacon code

Reported by: dimitris@gNoMaiSplAm.com Assigned to: dyqith
Priority: major Milestone:
Component: madwifi: other Version: trunk
Keywords: Cc: dimitris+madwifi@gmail.com
Patch is attached: 0 Pending:

Description

System:

  • Pentium 4, HT enabled, SMP enabled in the kernel.
  • Kernel 2.6.14.3 from kernel.org
  • 3com 5414 based PCI card:
    0000:02:03.0 Ethernet controller: Atheros Communications, Inc.: Unknown device 001b (rev 01)
            Subsystem: 3Com Corporation: Unknown device 6803
            Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr- Stepping- SERR+ FastB2B-
            Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
            Latency: 168 (2500ns min, 7000ns max), Cache Line Size: 0x10 (64 bytes)
            Interrupt: pin A routed to IRQ 19
            Region 0: Memory at fbe90000 (32-bit, non-prefetchable) [size=64K]
            Capabilities: <available only to root>
    
  • madwifi-ng r1346
  • Two AP devices configured: ath0 (WAP-PSK), ath1 (open).
  • hostapd 0.4.7 - config files to be attached here.
  • All madwifi modules are loaded with default parameters

The system will very often - sometimes minutes after bootup, other times 1-2 hours later - hang. In some instances there are entries in the log for the time of the crash like this:

Nov 29 22:32:50 greebo kernel: Badness in __kfree_skb at net/core/skbuff.c:330
Nov 29 22:32:50 greebo kernel:  [__kfree_skb+190/241] __kfree_skb+0xbe/0xf1
Nov 29 22:32:50 greebo kernel:  [pg0+945881622/1069843456] ath_tx_draintxq+0x84/0x193 [ath_pci]
Nov 29 22:32:50 greebo kernel:  [pg0+945858463/1069843456] ath_beacon_generate+0x125/0x320 [ath_pci]
Nov 29 22:32:50 greebo kernel:  [pg0+945859650/1069843456] ath_beacon_send+0x2a8/0x3be [ath_pci]
Nov 29 22:32:50 greebo kernel:  [pg0+947012897/1069843456] addrconf_dad_timer+0xfe/0x103 [ipv6]
Nov 29 22:32:50 greebo kernel:  [pg0+945844716/1069843456] ath_intr+0x272/0x356 [ath_pci]
Nov 29 22:32:50 greebo kernel:  [handle_IRQ_event+57/105] handle_IRQ_event+0x39/0x69
Nov 29 22:32:50 greebo kernel:  [__do_IRQ+140/248] __do_IRQ+0x8c/0xf8
Nov 29 22:32:50 greebo kernel:  [do_IRQ+25/36] do_IRQ+0x19/0x24
Nov 29 22:32:50 greebo kernel:  [common_interrupt+26/32] common_interrupt+0x1a/0x20
Nov 29 22:32:50 greebo kernel:  [pg0+945142394/1069843456] acpi_processor_idle+0x0/0x290 [processor]
Nov 29 22:32:50 greebo kernel:  [pg0+945142639/1069843456] acpi_processor_idle+0xf5/0x290 [processor]
Nov 29 22:32:50 greebo kernel:  [cpu_idle+77/98] cpu_idle+0x4d/0x62
Nov 29 22:32:50 greebo kernel:  [start_kernel+366/387] start_kernel+0x16e/0x183
Nov 29 22:32:50 greebo kernel:  [unknown_bootoption+0/447] unknown_bootoption+0x0/0x1bf

These come in "clusters" of 3 or so, one every about 4 seconds.

I've also seen this version of the stack, although it appears it's a separate code path into the same part of ath_pci:

Nov 29 23:50:45 greebo kernel: Badness in __kfree_skb at net/core/skbuff.c:330
Nov 29 23:50:45 greebo kernel:  [__kfree_skb+190/241] __kfree_skb+0xbe/0xf1
Nov 29 23:50:45 greebo kernel:  [pg0+945881622/1069843456] ath_tx_draintxq+0x84/0x193 [ath_pci]
Nov 29 23:50:45 greebo kernel:  [pg0+945858463/1069843456] ath_beacon_generate+0x125/0x320 [ath_pci]
Nov 29 23:50:45 greebo kernel:  [pg0+945859650/1069843456] ath_beacon_send+0x2a8/0x3be [ath_pci]
Nov 29 23:50:45 greebo kernel:  [pg0+945844716/1069843456] ath_intr+0x272/0x356 [ath_pci]
Nov 29 23:50:45 greebo kernel:  [handle_IRQ_event+57/105] handle_IRQ_event+0x39/0x69
Nov 29 23:50:45 greebo kernel:  [__do_IRQ+140/248] __do_IRQ+0x8c/0xf8
Nov 29 23:50:45 greebo kernel:  [do_IRQ+25/36] do_IRQ+0x19/0x24
Nov 29 23:50:45 greebo kernel:  [common_interrupt+26/32] common_interrupt+0x1a/0x20
Nov 29 23:50:45 greebo kernel:  [tcp_v4_checksum_init+222/339] tcp_v4_checksum_init+0xde/0x153
Nov 29 23:50:45 greebo kernel:  [nf_iterate+102/138] nf_iterate+0x66/0x8a
Nov 29 23:50:45 greebo kernel:  [tcp_v4_rcv+177/2527] tcp_v4_rcv+0xb1/0x9df
Nov 29 23:50:45 greebo kernel:  [ip_local_deliver_finish+0/592] ip_local_deliver_finish+0x0/0x250
Nov 29 23:50:45 greebo kernel:  [nf_hook_slow+89/253] nf_hook_slow+0x59/0xfd
Nov 29 23:50:45 greebo kernel:  [ip_local_deliver_finish+0/592] ip_local_deliver_finish+0x0/0x250
Nov 29 23:50:45 greebo kernel:  [ip_local_deliver+292/727] ip_local_deliver+0x124/0x2d7
Nov 29 23:50:45 greebo kernel:  [ip_local_deliver_finish+0/592] ip_local_deliver_finish+0x0/0x250
Nov 29 23:50:45 greebo kernel:  [ip_rcv+905/1456] ip_rcv+0x389/0x5b0
Nov 29 23:50:45 greebo kernel:  [ip_rcv_finish+0/747] ip_rcv_finish+0x0/0x2eb
Nov 29 23:50:45 greebo kernel:  [pg0+945235524/1069843456] speedo_rx+0x134/0x3b7 [eepro100]
Nov 29 23:50:45 greebo kernel:  [netif_receive_skb+360/479] netif_receive_skb+0x168/0x1df
Nov 29 23:50:45 greebo kernel:  [process_backlog+128/263] process_backlog+0x80/0x107
Nov 29 23:50:45 greebo kernel:  [net_rx_action+127/278] net_rx_action+0x7f/0x116
Nov 29 23:50:45 greebo kernel:  [__do_softirq+114/221] __do_softirq+0x72/0xdd
Nov 29 23:50:45 greebo kernel:  [do_softirq+51/53] do_softirq+0x33/0x35
Nov 29 23:50:45 greebo kernel:  [do_IRQ+30/36] do_IRQ+0x1e/0x24
Nov 29 23:50:45 greebo kernel:  [common_interrupt+26/32] common_interrupt+0x1a/0x20

And sometimes there's nothing in the log :(

This box has been on constantly for almost 2 years with no history of hardware lockups before the Atheros card was added.

This is different from #182 in that it usually takes some time, often with substantial traffic on the network(s), before the hang.

Attachments

hostapd-ath0.conf (10.0 kB) - added by dimitris@NoGSMaiPaml.com on 11/30/05 19:27:32.
WPA-PSK hostapd config
hostapd-ath1.conf (0.6 kB) - added by dimitris@NoGSMaiPaml.com on 11/30/05 19:28:27.
hostapd config for open AP
dev_skb_free_any.patch (397 bytes) - added by dimitris on 12/01/05 02:30:46.
replacing dev_skb_free with dev_skb_free_any in ath_tx_draintxq
kern.log.0 (55.3 kB) - added by dimitris on 12/01/05 22:08:03.
SysRq? dump on a hung system with the SMP kernel
config-smp (37.7 kB) - added by dimitris on 12/11/05 08:57:53.
.config for 2.6.14.3 with SMP turned on

Change History

11/30/05 19:27:32 changed by dimitris@NoGSMaiPaml.com

  • attachment hostapd-ath0.conf added.

WPA-PSK hostapd config

11/30/05 19:28:27 changed by dimitris@NoGSMaiPaml.com

  • attachment hostapd-ath1.conf added.

hostapd config for open AP

11/30/05 19:33:26 changed by anonymous

One more thing: here's the dmesg for the card, the driver/HAL seems to recognize it for a 5212:

ath_hal: module license 'Proprietary' taints kernel.
ath_hal: 0.9.16.3 (AR5210, AR5211, AR5212, RF5111, RF5112, RF2413, RF5413, DFS)
wlan: 0.8.4.2 (Atheros/multi-bss)
ath_rate_sample: 1.2
ath_pci: 0.9.4.5 (Atheros/multi-bss)
ACPI: PCI Interrupt 0000:02:03.0[A] -> GSI 19 (level, low) -> IRQ 19
wifi0: 11a rates: 6Mbps 9Mbps 12Mbps 18Mbps 24Mbps 36Mbps 48Mbps 54Mbps
wifi0: 11b rates: 1Mbps 2Mbps 5.5Mbps 11Mbps
wifi0: 11g rates: 1Mbps 2Mbps 5.5Mbps 11Mbps 6Mbps 9Mbps 12Mbps 18Mbps 24Mbps 36Mbps 48Mbps 54Mbps
wifi0: turboA rates: 6Mbps 9Mbps 12Mbps 18Mbps 24Mbps 36Mbps 48Mbps 54Mbps
wifi0: turboG rates: 6Mbps 12Mbps 18Mbps 24Mbps 36Mbps 48Mbps 54Mbps
wifi0: H/W encryption support: WEP AES AES_CCM TKIP
wifi0: mac 10.5 phy 6.1 radio 6.3
wifi0: Use hw queue 1 for WME_AC_BE traffic
wifi0: Use hw queue 0 for WME_AC_BK traffic
wifi0: Use hw queue 2 for WME_AC_VI traffic
wifi0: Use hw queue 3 for WME_AC_VO traffic
wifi0: Use hw queue 8 for CAB traffic
wifi0: Use hw queue 9 for beacons
wifi0: Atheros 5212: mem=0xfbe90000, irq=19

even though it's described as a 5414.

12/01/05 02:30:46 changed by dimitris

  • attachment dev_skb_free_any.patch added.

replacing dev_skb_free with dev_skb_free_any in ath_tx_draintxq

12/01/05 02:34:38 changed by anonymous

I googled around and it seems that calling dev_skb_free from an interrupt context is not a good idea. I'm attaching a simple patch here that calls dev_skb_free_any instead.

I'm not sure whether that's enough (didn't change anything with the queue locking in ath_tx_draintxq).

Signed-off-by: Dimitris Kogias <dimitris@gmail.com>

12/01/05 02:49:30 changed by anonymous

Running with the change above, it sitll hangs after a while. Nothing on the log :(

I'll keep running the atheros card for now, but I'm not sure what to try next in order to diagnose this...

12/01/05 07:52:56 changed by mrenzmann

  • version set to trunk.

Can you try to find out if this problem also occurs when SMP is disabled?

12/01/05 07:55:05 changed by anonymous

Will do. BTW, I also enabled CONFIG_MAGIC_SYSRQ and tried Alt-SysRq?-<command> next time it hanged, but got no response.

12/01/05 20:40:13 changed by anonymous

  • cc set to dimitris+madwifi@gmail.com.

Without SMP it seems to work. It's been up for 12 hours now, with light traffic on both networks.

Any suggestions for places in the code to look/debug where SMP would be a factor?

12/01/05 22:08:03 changed by dimitris

  • attachment kern.log.0 added.

SysRq? dump on a hung system with the SMP kernel

12/01/05 22:09:04 changed by anonymous

I was wrong about SysRq? (dmesg'd it away from the console). Anyway, I attached the dump from when the system crashed (when it was still running with SMP). Hope this helps :)

12/02/05 08:31:34 changed by mrenzmann

Regarding debugging facility: see DevDocs/AthDebug. You could try to find a debug setting that gives you infos on what happens right before the crash occurs. I think it's a problem related to locking, but the question is where it is wrong.

12/11/05 08:55:25 changed by anonymous

Switched back to a SMP 2.6.14.3 kernel, and turned on the following debug flags:

  • athdebug
    • beacon
    • intr
    • beacon_proc
  • 802debug
    • output

Of course, thanks to Mr Murphy, it hasn't crashed yet. Any other flags I should turn on?

I'll attach the kernel .config here in case it's relevant.

12/11/05 08:57:53 changed by dimitris

  • attachment config-smp added.

.config for 2.6.14.3 with SMP turned on

12/11/05 08:59:33 changed by anonymous

Forgot to mention, I'm now running a pristine [1352], without the dev_skb_free_any change above.

12/11/05 09:37:26 changed by anonymous

OK, managed to get a crash. Here are the last few lines of /var/log/kern.log:

Dec 11 00:26:21 greebo kernel: ath_intr: status 0x11000
Dec 11 00:26:21 greebo kernel: ath_beacon_send: slot 3 [tsf 5261096976 tsftu 5137790 intval 100] vap 00000000
Dec 11 00:26:21 greebo kernel: ath_intr: status 0x5009
Dec 11 00:26:21 greebo kernel: ath_intr: status 0x5009
Dec 11 00:26:21 greebo kernel: ath_intr: status 0x11000
Dec 11 00:26:21 greebo kernel: ath_beacon_send: slot 0 [tsf 5261122570 tsftu 5137815 intval 100] vap f688e380
Dec 11 00:26:21 greebo kernel: ath_beacon_setup: m f673c480 len 147
Dec 11 00:26:21 greebo kernel: ath_intr: status 0x1040
Dec 11 00:26:21 greebo kernel: ath_intr: status 0x5009
Dec 11 00:26:21 greebo kernel: ath_intr: status 0x1009
Dec 11 00:26:21 greebo kernel: ath_intr: status 0x11000
Dec 11 00:26:21 greebo kernel: ath_beacon_send: slot 1 [tsf 5261148168 tsftu 5137840 intval 100] vap 00000000
Dec 11 00:26:21 greebo kernel: ath_intr: status 0x5009
Dec 11 00:26:21 greebo kernel: ath_intr: status 0x11000
Dec 11 00:26:21 greebo kernel: ath_beacon_send: slot 2 [tsf 5261173770 tsftu 5137865 intval 100] vap f799c380
Dec 11 00:26:21 greebo kernel: ath_beacon_setup: m f6600e80 len 119
Dec 11 00:26:21 greebo kernel: ath_intr: status 0x5009
Dec 11 00:26:21 greebo kernel: ath_intr: status 0x11000
Dec 11 00:26:21 greebo kernel: ath_beacon_send: slot 3 [tsf 5261199376 tsftu 5137890 intval 100] vap 00000000
Dec 11 00:26:21 greebo kernel: ath_intr: status 0x1040
Dec 11 00:26:21 greebo kernel: ath_intr: status 0x11000
Dec 11 00:26:21 greebo kernel: ath_beacon_send: slot 0 [tsf 5261224980 tsftu 5137915 intval 100] vap f688e380
Dec 11 00:26:21 greebo kernel: ath_beacon_setup: m f673c480 len 147
Dec 11 00:26:21 greebo kernel: ath_intr: status 0x1009
Dec 11 00:26:21 greebo kernel: ath_intr: status 0x11000
Dec 11 00:26:21 greebo kernel: ath_beacon_send: slot 1 [tsf 5261250570 tsftu 5137940 intval 100] vap 00000000
Dec 11 00:26:21 greebo kernel: ath_intr: status 0x1040
Dec 11 00:26:21 greebo kernel: ath_intr: status 0x1040
Dec 11 00:26:21 greebo kernel: ath_intr: status 0x11000
Dec 11 00:26:21 greebo kernel: ath_beacon_send: slot 2 [tsf 5261276174 tsftu 5137965 intval 100] vap f799c380
Dec 11 00:26:21 greebo kernel: ath_beacon_setup: m f6600e80 len 119
Dec 11 00:26:21 greebo kernel: ath_intr: status 0x1009
Dec 11 00:26:21 greebo kernel: ath_intr: status 0x1009
Dec 11 00:26:21 greebo kernel: ath_intr: status 0x11000
Dec 11 00:26:21 greebo kernel: ath_beacon_send: slot 3 [tsf 5261301768 tsftu 5137990 intval 100] vap 00000000
Dec 11 00:26:21 greebo kernel: ath_intr: status 0x1040
Dec 11 00:26:21 greebo kernel: ath_intr: status 0x11000
Dec 11 00:26:21 greebo kernel: ath_beacon_send: slot 0 [tsf 5261327374 tsftu 5138015 intval 100] vap f688e380
Dec 11 00:26:21 greebo kernel: ath_beacon_setup: m f673c480 len 147
Dec 11 00:26:21 greebo kernel: ath_intr: status 0x5009
Dec 11 00:26:21 greebo kernel: ath_intr: status 0x1009
Dec 11 00:26:21 greebo kernel: ath_intr: status 0x11000
Dec 11 00:26:21 greebo kernel: ath_beacon_send: slot 1 [tsf 5261352970 tsftu 5138040 intval 100] vap 00000000
Dec 11 00:26:21 greebo kernel: ath_intr: status 0x11000
Dec 11 00:26:21 greebo kernel: ath_beacon_send: slot 2 [tsf 5261378572 tsftu 5138065 intval 100] vap f799c380
Dec 11 00:26:21 greebo kernel: ath_beacon_setup: m f6600e80 len 119
Dec 11 00:26:21 greebo kernel: ath_intr: status 0x5009
Dec 11 00:26:21 greebo kernel: ath_intr: status 0x1040
Dec 11 00:26:21 greebo kernel: ath_intr: status 0x11000
Dec 11 00:26:21 greebo kernel: ath_beacon_send: slot 3 [tsf 5261404176 tsftu 5138090 intval 100] vap 00000000
Dec 11 00:26:21 greebo kernel: ath_intr: status 0x5009
Dec 11 00:26:21 greebo kernel: ath_intr: status 0x1009
Dec 11 00:26:21 greebo kernel: ath_intr: status 0x1040
Dec 11 00:26:21 greebo kernel: ath_intr: status 0x1040
Dec 11 00:26:21 greebo kernel: ath_intr: status 0x11000
Dec 11 00:26:21 greebo kernel: ath_beacon_send: slot 0 [tsf 5261429774 tsftu 5138115 intval 100] vap f688e380
Dec 11 00:26:21 greebo kernel: ath_beacon_setup: m f673c480 len 147
Dec 11 00:26:21 greebo kernel: ath_intr: status 0x1009
Dec 11 00:26:21 greebo kernel: ath_intr: status 0x1040
Dec 11 00:26:21 greebo kernel: ath_intr: status 0x1009
Dec 11 00:26:21 greebo kernel: ath_intr: status 0x5009
Dec 11 00:26:21 greebo kernel: ath_intr: status 0x11000
Dec 11 00:26:21 greebo kernel: ath_beacon_send: slot 1 [tsf 5261455370 tsftu 5138140 intval 100] vap 00000000
Dec 11 00:26:21 greebo kernel: ath_intr: status 0x5009
Dec 11 00:26:21 greebo kernel: ath_intr: status 0x1009
Dec 11 00:26:21 greebo last message repeated 2 times
Dec 11 00:26:21 greebo kernel: ath_intr: status 0x5009
Dec 11 00:26:21 greebo kernel: ath_intr: status 0x1009
Dec 11 00:26:21 greebo kernel: ath_intr: status 0x11000
Dec 11 00:26:21 greebo kernel: ath_beacon_send: slot 2 [tsf 5261480974 tsftu 5138165 intval 100] vap f799c380
Dec 11 00:26:21 greebo kernel: ath_beacon_setup: m f6600e80 len 119
Dec 11 00:26:21 greebo kernel: ath_intr: status 0x5009
Dec 11 00:26:21 greebo kernel: ath_intr: status 0x1040
Dec 11 00:26:21 greebo kernel: ath_intr: status 0x11000
Dec 11 00:26:21 greebo kernel: ath_beacon_send: slot 3 [tsf 5261506574 tsftu 5138190 intval 100] vap 00000000
Dec 11 00:26:21 greebo kernel: ath_intr: status 0x1040
Dec 11 00:26:21 greebo kernel: ath_intr: status 0x1009
Dec 11 00:26:21 greebo kernel: ath_intr: status 0x5009
Dec 11 00:26:21 greebo kernel: ath_intr: status 0x11000
Dec 11 00:26:21 greebo kernel: ath_beacon_send: slot 0 [tsf 5261532168 tsftu 5138215 intval 100] vap f688e380
Dec 11 00:26:21 greebo kernel: ath_beacon_setup: m f673c480 len 147
Dec 11 00:26:21 greebo kernel: ath_intr: status 0x5009
Dec 11 00:26:21 greebo kernel: ath_intr: status 0x11000
Dec 11 00:26:21 greebo kernel: ath_beacon_send: slot 1 [tsf 5261557774 tsftu 5138240 intval 100] vap 00000000
Dec 11 00:26:21 greebo kernel: ath_intr: status 0x1040
Dec 11 00:26:21 greebo kernel: ath_intr: status 0x1040
Dec 11 00:26:21 greebo kernel: ath_intr: status 0x1009
Dec 11 00:26:21 greebo kernel: ath_intr: status 0x1040
Dec 11 00:26:21 greebo kernel: ath_intr: status 0x11000
Dec 11 00:26:21 greebo kernel: ath_beacon_send: slot 2 [tsf 5261583372 tsftu 5138265 intval 100] vap f799c380
Dec 11 00:26:21 greebo kernel: ath_beacon_setup: m f6600e80 len 119

At that point, the next entries are from the reboot:

Dec 11 00:28:33 greebo kernel: klogd 1.4.1#17, log source = /proc/kmsg started.
Dec 11 00:28:33 greebo kernel: Inspecting /boot/System.map-2.6.14.3
Dec 11 00:28:33 greebo kernel: Loaded 18129 symbols from /boot/System.map-2.6.14.3.
Dec 11 00:28:33 greebo kernel: Symbols match kernel version 2.6.14.
Dec 11 00:28:33 greebo kernel: No module symbols loaded - kernel modules not enabled.
Dec 11 00:28:33 greebo kernel: Linux version 2.6.14.3 (d@greebo) (gcc version 4.0.3 20051111 (prerelease) (Debian 4.0.2-4)) #7 Wed Nov 30 23:12:52 PST 2005

04/07/06 06:53:04 changed by dyqith

  • patch_attached changed.

Can you try the patch at: http://madwifi.org/ticket/472

and report back ?

04/07/06 06:53:10 changed by dyqith

  • status changed from new to assigned.
  • owner set to dyqith.

04/07/06 08:49:33 changed by anonymous

I've been running the spin_locks-20050331 patch from #472 for several days no with no problems.

04/21/06 19:11:34 changed by dyqith

  • status changed from assigned to closed.
  • resolution set to fixed.

closing ticket, patch applied.