Please note: This project is no longer active. The website is kept online for historic purposes only.
If you´re looking for a Linux driver for your Atheros WLAN device, you should continue here .

Ticket #656 (new defect)

Opened 16 years ago

Last modified 15 years ago

mipsel kernel oops on "ifconfig up"

Reported by: georg@boerde.de Assigned to:
Priority: major Milestone:
Component: madwifi: HAL Version:
Keywords: hal mips Cc:
Patch is attached: 0 Pending:

Description

When bringing up an interface on a MIPSel machine (OpenWrt / Netgear WGT634U), the kernel Oopses in the following reproducible way:

root@OpenWrt:/# wlanconfig ath0 destroy
root@OpenWrt:/# wlanconfig ath0 create wlandev wifi0 wlanmode sta
ath0
root@OpenWrt:/# ifconfig ath0 up
Data bus error, epc == c005f2c0, ra == c007ac48
Oops[#1]:
Cpu 0
$ 0   : 00000000 10009c00 c00e9930 00000001
$ 4   : 81398000 00009930 81398308 813982c8
$ 8   : 000000c0 80220000 00000008 00000000
$12   : 00401043 00705aac 00000001 00000010
$16   : 8139a8a0 81398000 81398000 81398000
$20   : 00000000 81398308 80260000 00008914
$24   : 00000000 c00b4f08
$28   : 81cc0000 81cc1cb8 80331174 c007ac48
Hi    : 000003c5
Lo    : 1eb3f600
epc   : c005f2c0     Tainted: P
ra    : c007ac48 Status: 10009c03    KERNEL EXL IE
Cause : 0000001c
PrId  : 00029007
Modules linked in: ath_pci ath_rate_sample ath_hal wlan_scan_sta wlan_wep wlan
Process ifconfig (pid: 390, threadinfo=81cc0000, task=81e8c000)
Stack : 00401043 00705aac 00000001 00000010 c005f2b8 80330260 81398000 c00766f0
        00000000 80330f1c 80260000 00008914 00000000 c00b4f08 00000000 00000000
        00000000 00000001 00000000 02000000 10009c03 00000369 00000000 80330260
        81398000 80330000 00000000 80330f1c 80260000 00008914 7fcdac68 c00fe4c0
        00008000 00000001 0000003c 00000000 81cc1d50 10009c00 80001400 00002000
        ...
Call Trace: [<c005f2b8>]  [<c00766f0>]  [<c00b4f08>]  [<c00fe4c0>]  [<801505e0>]  [<80001fd4>]  [<c00b4fac>]  [<801505e0>]

Code: 00000000  8c820014  00451021 <8c420000> 03e00008  00000000  3c020041  34428937  70822002
Segmentation fault

Data bus errors are caused by misaligned memory access on some architectures, maybe it is the cause here?

From the stack trace, it seems to be the following symbols are called(sorry, I haven't figured out ksymoops on crosscompiled code yet):

<c005f2b8> ath_hal_reg_read+0
<c00766f0> zz0002dbd2+2e0
<c00b4f08> ieee80211_init+0
<c00fe4c0> ath_init+2bc

And this is the madwifi release I'm using:

wlan: 0.8.4.2 (svn r1611)                                                       
ath_hal: module license 'Proprietary' taints kernel.                            
ath_hal: 0.9.17.0 (AR5210, AR5211, AR5212, RF5111, RF5112, RF2413, RF5413, REGO)
ath_rate_sample: 1.2 (svn r1611)                                                
ath_pci: 0.9.4.5 (svn r1611)                                                    
PCI: Enabling device 0000:01:01.0 (0000 -> 0002)                                
PCI: Fixing up device 0000:01:01.0                                              
wifi0: 11b rates: 1Mbps 2Mbps 5.5Mbps 11Mbps                                    
wifi0: 11g rates: 1Mbps 2Mbps 5.5Mbps 11Mbps 6Mbps 9Mbps 12Mbps 18Mbps 24Mbps 3s
wifi0: turboG rates: 6Mbps 12Mbps 18Mbps 24Mbps 36Mbps 48Mbps 54Mbps            
wifi0: H/W encryption support: WEP AES AES_CCM TKIP                             
wifi0: mac 5.9 phy 4.3 radio 4.6                                                
wifi0: Use hw queue 1 for WME_AC_BE traffic                                     
wifi0: Use hw queue 0 for WME_AC_BK traffic                                     
wifi0: Use hw queue 2 for WME_AC_VI traffic                                     
wifi0: Use hw queue 3 for WME_AC_VO traffic                                     
wifi0: Use hw queue 8 for CAB traffic                                           
wifi0: Use hw queue 9 for beacons                                               

Change History

06/14/06 16:51:15 changed by georg@boerde.de

Retested with r1644 and activated kernel symbols. Still the same picture :(

wlan: 0.8.4.2 (svn r1644)
ath_hal: module license 'Proprietary' taints kernel.
ath_hal: 0.9.17.2 (AR5210, AR5211, AR5212, RF5111, RF5112, RF2413, RF5413, REGOPS_FUNC)
ath_rate_sample: 1.2 (svn r1644)
ath_pci: 0.9.4.5 (svn r1644)

The problem can be reproduced by destroying the autocreated master device (which is default on OpenWrt) and twice creating a managed device afterwards:

root@OpenWrt:/# wlanconfig ath0 destroy
root@OpenWrt:/# wlanconfig ath0 create wlandev wifi0 wlanmode managed
ath0
root@OpenWrt:/# wlanconfig ath0 destroy
root@OpenWrt:/# wlanconfig ath0 create wlandev wifi0 wlanmode managed
ath0
root@OpenWrt:/# ifconfig ath0 up
Data bus error, epc == c01172c0, ra == c0132c48
Oops[#1]:
Cpu 0
$ 0   : 00000000 10009c00 c0089930 00000001
$ 4   : 81c18000 00009930 81c18308 81c182c8
$ 8   : 000000c0 81d2f000 81d2d000 81d8c000
$12   : 81d40000 00705aac 00000001 81d33000
$16   : 81c1a8a0 81c18000 81c18000 81c18000
$20   : 00000000 81c18308 80270000 00008914
$24   : 00000000 c00b4f18
$28   : 81d64000 81d65cb8 8034d174 c0132c48
Hi    : 000003cc
Lo    : ccc86800
epc   : c01172c0 ath_hal_reg_read+0x8/0x14 [ath_hal]     Tainted: P
ra    : c0132c48 zz002db51c+0x50/0x438 [ath_hal]
Status: 10009c03    KERNEL EXL IE 
Cause : 0000001c
PrId  : 00029007
Modules linked in: ath_pci ath_rate_sample ath_hal wlan_scan_sta wlan_wep wlan
Process ifconfig (pid: 394, threadinfo=81d64000, task=803a64f8)
Stack : 00000000 80270000 2a61da80 0000003f c01172b8 8034c260 81c18000 c012e6f0
        00000000 80073b40 00030000 80037d30 80222590 00000000 00000000 00000000
        00000000 00000002 00000000 02000000 00000000 8027d850 00000000 8034c260
        81c18000 8034c000 00000000 8034cf1c 80270000 00008914 7f8b2c68 c00df4c4
        8122f200 00000001 80033510 00000000 81d65d50 81d65d98 8122f200 0048bb20
        ...
Call Trace:
 [<c01172b8>] ath_hal_reg_read+0x0/0x14 [ath_hal]
 [<c012e6f0>] zz0002dbd2+0x2e0/0x1178 [ath_hal]
 [<80073b40>] cache_flusharray+0x74/0xac
 [<80037d30>] run_timer_softirq+0x2c/0x1ec
 [<c00df4c4>] ath_init+0x2bc/0x4c8 [ath_pci]
 [<80033510>] do_softirq+0x58/0x8c
 [<80154090>] dev_open+0xd8/0x1c0
 [<c00b4fbc>] ieee80211_init+0xa4/0x180 [wlan]
 [<80154090>] dev_open+0xd8/0x1c0
 [<8005394c>] filemap_nopage+0x1b0/0x550
 [<80155fe8>] dev_change_flags+0x74/0x14c
 [<8015620c>] dev_ifsioc+0x20/0x49c
 [<801af38c>] devinet_ioctl+0x2fc/0x9a0
 [<801af1fc>] devinet_ioctl+0x16c/0x9a0
 [<80156b80>] dev_ioctl+0x4f8/0x778
 [<801b0e1c>] inet_ioctl+0xc8/0xfc
 [<8014732c>] sock_ioctl+0x538/0x580
 [<80147348>] sock_ioctl+0x554/0x580
 [<800fb960>] sprintf+0x28/0x34
 [<800923e0>] do_ioctl+0x30/0x78
 [<80092738>] vfs_ioctl+0x310/0x338
 [<80147ac0>] sock_create+0x10/0x1c
 [<800927b0>] sys_ioctl+0x50/0x90
 [<80012a40>] stack_done+0x20/0x3c
 [<80012a40>] stack_done+0x20/0x3c


Code: 00000000  8c820014  00451021 <8c420000> 03e00008  00000000  3c020041  34428937  70822002
Segmentation fault
root@OpenWrt:/#

12/12/06 04:28:58 changed by drgz

any update?

12/12/06 06:47:32 changed by mrenzmann

Did anyone test current trunk to see if the problem still exists there?

03/27/07 18:38:15 changed by jhansen@cardaccess-inc.com

This annoying problem still occurs from time to time, especially on the WGT624U router (bcm947xx processor). I think that either the hardware layout of the WGT634U or the layout of the mini-PCI card is to blame, since it seems to be a PCI data bus error that is happening. It doesn't happen at all on my WL-500G Premium, with very similar hardware.

You can work around this problem by either a) adding a 10us delay in ath_hal_reg_read, or b) using get_dbe to read registers instead of simply readl'ing, etc. If you use get_dbe, data bus errors are caught, and you can just keep doing get_dbe until the stupid hardware gives back the register value correctly.

04/03/07 07:27:49 changed by jhansen@cardaccess-inc.com

I have finally fixed the problem on my wgt634u's for good. This solution may be applied to other similar architectures.

The Broadcom 947xx PCI host controller seems to be buggy/unstable after a warm reset of the CPU. Therefore, you need to ensure that the PCI host controller core gets reset after a reboot.

In the new BSP (brcm47xx-2.6), you can do this by removing the line:

if (!ssb_device_is_enabled(dev))

from drivers/ssb/driver_pci/pcicore.c in the kernel source. This should force the next line (ssb_device_enable) to be run always, which disables the core, then re-enables the core (thus giving the desired stability).

I'm not sure how to do this with the old BSP (w/2.4 kernel, etc.), but I'm sure it would be a similar procedure. You should be using the latest BSP anyway, since it works better anyway, and you aren't tethered to wl.o with Atheros. See openwrt.org bug 464 for more info.

04/03/07 07:28:40 changed by jhansen@cardaccess-inc.com

By the way, that previous line # was around 396 in pcicore.c (for "if (!ssb_device_is_enabled(dev))").