Please note: This project is no longer active. The website is kept online for historic purposes only.
If you´re looking for a Linux driver for your Atheros WLAN device, you should continue here .

Ticket #1049 (closed defect: fixed)

Opened 13 years ago

Last modified 9 years ago

Kernel oops during scan timer (XScale IXP425 BE)

Reported by: roee Assigned to:
Priority: major Milestone: version 0.9.5
Component: madwifi: other Version: trunk
Keywords: Cc:
Patch is attached: 1 Pending:

Description (Last modified by mrenzmann)

I'm using madwifi 0.9.2 on XScale IXP425 BE and getting the following oops a lot.

The board is ADI Pronghorn and the kernel version is 2.6.12. I have tested this on r1860 and it happens also. I figured you'd probably be more interested in the r1860 oops so this is the one that I post here. The way to reproduce this is simply by loading-unloading the driver repeatedly using a script. The oops happens during the loading phase.

The oops dump is:

ath_hal done
ath_hal: 0.9.18.0 (AR5210, AR5211, AR5212, RF5111, RF5112, RF2413, RF5413, REGOPS_FUNC)
wlan done
wlan: 0.8.4.2 (svn r1860)
ath_rate_sample done
ath_rate_sample: 1.2 (svn r1860)
wlan_scan_sta done
ath_pci done
ath_pci: 0.9.4.5 (svn r1860)
PCI: enabling device 0000:00:0f.0 (0340 -> 0342)
wifi0: 11a rates: 6Mbps 9Mbps 12Mbps 18Mbps 24Mbps 36Mbps 48Mbps 54Mbps
wifi0: 11b rates: 1Mbps 2Mbps 5.5Mbps 11Mbps
wifi0: 11g rates: 1Mbps 2Mbps 5.5Mbps 11Mbps 6Mbps 9Mbps 12Mbps 18Mbps 24Mbps 36Mbps 48Mbps 54Mbps
wifi0: H/W encryption support: WEP AES AES_CCM TKIP
wifi0: mac 5.9 phy 4.3 radio 3.6
wifi0: Use hw queue 1 for WME_AC_BE traffic
wifi0: Use hw queue 0 for WME_AC_BK traffic
wifi0: Use hw queue 2 for WME_AC_VI traffic
wifi0: Use hw queue 3 for WME_AC_VO traffic
wifi0: Use hw queue 8 for CAB traffic
wifi0: Use hw queue 9 for beacons
wifi0: Atheros 5212: mem=0x48000000, irq=28

 Setting IP params of the wireless interface

Applying [essid=celeno] to client
Applying [band=11a] to client
Applying [channel=52] to client
Applying [rate=54M] to client
Applying [txpower=6] to client
Applying [ack_rate=min] to client

Bad mode in data abort handler detected: mode IRQ_32
Internal error: Oops - bad mode: 0 [#1]
Modules linked in: ath_pci wlan_scan_sta ath_rate_sample wlan ath_hal ixp425_eth ixp400
CPU: 0
PC is at 0xffff021c
LR is at zz00067d32+0x2c/0x5c [ath_hal]
pc : [<ffff021c>]    lr : [<bf0ab6f8>]    Tainted: P
sp : c3b11c3c  ip : c3b11c94  fp : c3b11c90
r10: c38b8000  r9 : 00000001  r8 : c38b8000
r7 : 00004000  r6 : 00000003  r5 : 00000003  r4 : 00000000
r3 : c4860000  r2 : 00000003  r1 : 00004000  r0 : c38b8000
Flags: Nzcv  IRQs off  FIQs on  Mode IRQ_32  Segment user
Control: 39FF  Table: 02E74000  DAC: 00000015
Process grep (pid: 732, stack limit = 0xc3b10194)
Stack: (0xc3b11c3c to 0xc3b12000)
1c20:                                                                c38b8000
1c40: 00004000 00000003 c4860000 00000000 00000003 00000003 00004000 c38b8000
1c60: 00000001 c38b8000 c3b11c90 c3b11c94 c3b11c3c bf0ab6f8 ffff021c 80000092
1c80: ffffffff c3b11cb4 c3b11c94 bf0ab6f8 bf0ab1e4 00000003 c38b8000 c3b11d70
1ca0: 00000000 c3228448 c3b11cd4 c3b11cb8 bf0c0ab0 bf0ab6d8 00000003 c38b8000
1cc0: c3b11d70 c3b11d70 c3b11cfc c3b11cd8 bf0c047c bf0c0a24 02000000 c38b8000
1ce0: c3b11d70 00000000 c3228448 c38b8000 c3b11d54 c3b11d00 bf0bf330 bf0c046c
1d00: bf0bd570 bf0ab1e4 02000000 00000020 00000001 00000017 c38b89bc 00000000
1d20: 00018000 00000001 c3b11d54 c3b11d70 c3228220 c3228220 c38b8000 c3228448
1d40: c3228000 c3228220 c3b11da0 c3b11d58 bf1291bc bf0bf100 c3b11d6c c3b11dac
1d60: 00001000 c02c8520 00000000 c3b11d78 16c10140 c00299f0 c3228220 c3228220
1d80: c3b87000 c3228220 c2cda220 000000c8 c3228448 c3b11db8 c3b11da4 bf1297d4
1da0: bf12900c 00000000 7ffbd436 c3b11dc8 c3b11dbc bf0f55e4 bf1297bc c3b11e14
1dc0: c3b11dcc bf0f6144 bf0f55d8 c032a820 00000000 c3b11e94 00000000 c3b11ddc
1de0: c3b11ddc c032a820 c3b11e0c c3b10000 00000100 00000000 bf0f5fbc c3b11e18
1e00: c024553c c0244b34 c3b11e4c c3b11e18 c0043c00 bf0f5fc8 c3b11e18 c3b11e18
1e20: c00278fc 00000001 c02448d0 c02463a8 0000000a 4006a000 c3c43900 00000047
1e40: c3b11e6c c3b11e50 c003f36c c0043a74 c3b11e94 0000001f 00000020 40023000
1e60: c3b11e7c c3b11e70 c003f510 c003f318 c3b11e90 c3b11e80 c0023b48 c003f4dc
1e80: ffffffff c3b11f28 c3b11e94 c0022740 c0023af0 fffe9540 ffff0000 00100077
1ea0: 00047000 40023000 00100077 c3b11f7c 40023000 4006a000 c3c43900 00000047
1ec0: c3b11f28 00000000 c3b11edc c006ce84 c002b6d8 20000013 ffffffff c2e75000
1ee0: c0343750 40023000 00000017 c2e75000 c3c43900 0000005f 4006a000 c03433e0
1f00: c03433e0 00100077 c3b11f7c 40023000 4006a000 c3c43900 00000047 c3b11f74
1f20: c3b11f2c c006d0f4 c006cdf4 00100077 00000000 c3cd7a00 00000000 00000000
1f40: 0000005f 00000047 00000075 4006a000 40023000 fffffff4 00000000 4006a000
1f60: c3b10000 00000007 c3b11fa4 c3b11f78 c006d2d4 c006cf9c 00000077 c03433e0
1f80: 40006074 00000002 40006000 0000007d c0022c44 4000d090 00000000 c3b11fa8
1fa0: c0022ac0 c006d16c 40006074 c00299f0 40023000 000461e0 00000007 000461e0
1fc0: 40006074 00000002 40006000 bef7b694 bef7ae08 40071468 4000d090 00000fff
1fe0: 4000602c bef7ac60 fffff000 40002834 20000010 40023000 00000000 00000000
Backtrace:
[<bf0ab1d8>] (ath_hal_reg_read+0x0/0x48 [ath_hal]) from [<bf0ab6f8>] (zz00067d32+0x2c/0x5c [ath_hal])
[<bf0ab6cc>] (zz00067d32+0x0/0x5c [ath_hal]) from [<bf0c0ab0>] (zz05b781e0+0x3cc/0x438 [ath_hal])
 r8 = C3228448  r7 = 00000000  r6 = C3B11D70  r5 = C38B8000
 r4 = 00000003
[<bf0c0a18>] (zz05b781e0+0x334/0x438 [ath_hal]) from [<bf0c047c>] (zz002db292+0x1c/0x284 [ath_hal])
 r6 = C3B11D70  r5 = C3B11D70  r4 = C38B8000
[<bf0c0460>] (zz002db292+0x0/0x284 [ath_hal]) from [<bf0bf330>] (zz0002dbd2+0x23c/0xf90 [ath_hal])
[<bf0bf0f4>] (zz0002dbd2+0x0/0xf90 [ath_hal]) from [<bf1291bc>] (ath_chan_set+0x1bc/0x424 [ath_pci])
[<bf129000>] (ath_chan_set+0x0/0x424 [ath_pci]) from [<bf1297d4>] (ath_set_channel+0x24/0x64 [ath_pci])
[<bf1297b0>] (ath_set_channel+0x0/0x64 [ath_pci]) from [<bf0f55e4>] (change_channel+0x18/0x1c [wlan])
 r5 = 7FFBD436  r4 = 00000000
[<bf0f55cc>] (change_channel+0x0/0x1c [wlan]) from [<bf0f6144>] (scan_next+0x188/0x454 [wlan])
[<bf0f5fbc>] (scan_next+0x0/0x454 [wlan]) from [<c0043c00>] (run_timer_softirq+0x198/0x214)
[<c0043a68>] (run_timer_softirq+0x0/0x214) from [<c003f36c>] (__do_softirq+0x60/0xdc)
[<c003f30c>] (__do_softirq+0x0/0xdc) from [<c003f510>] (irq_exit+0x40/0x48)
 r7 = 40023000  r6 = 00000020  r5 = 0000001F  r4 = C3B11E94
[<c003f4d0>] (irq_exit+0x0/0x48) from [<c0023b48>] (asm_do_IRQ+0x64/0x74)
[<c0023ae4>] (asm_do_IRQ+0x0/0x74) from [<c0022740>] (__irq_svc+0x20/0x60)
 r4 = FFFFFFFF
[<c006cde8>] (change_protection+0x0/0x1a8) from [<c006d0f4>] (mprotect_fixup+0x164/0x1d0)
[<c006cf90>] (mprotect_fixup+0x0/0x1d0) from [<c006d2d4>] (sys_mprotect+0x174/0x1dc)
[<c006d160>] (sys_mprotect+0x0/0x1dc) from [<c0022ac0>] (ret_fast_syscall+0x0/0x2c)
Code: e14fe000 e58de004 e10fd000 e3cdd01f (e38dd013)
 <0>Kernel panic - not syncing: Aiee, killing interrupt handler!

Note that some of the printouts just before the oops are from my boot scripts which sets various parameters using iwconfig/iwpriv.

I get this oops in various forms i.e. the process is not always the same but the final part of the scan timer is always the same.

I was wondering if this is also related to the problems that XScale has with recent HALs (as reported in ticket #914).

Attachments

madwifi-mb.diff (1.0 kB) - added by mentor on 04/29/07 21:47:21.
Memory Barriers for IO Access
madwifi-mb2.diff (1.0 kB) - added by anders.lundstrom on 05/03/07 16:02:57.
Memory Barriers for IO Access 2
madwifi-0.9.3-reg.diff (2.6 kB) - added by mtaylor on 05/22/07 01:11:04.
Signed-off-by: Mike Taylor (mike.taylor@apprion.com)
madwifi-0.9.3-reg.2.diff (1.3 kB) - added by mtaylor on 05/22/07 23:15:58.
Stable version of the patch
madwifi-0.9.3-reg.3.diff (1.5 kB) - added by mtaylor on 05/22/07 23:17:43.
Correct version. Ignore madwifi-0.9.3-reg.2.diff.

Change History

12/19/06 17:52:03 changed by mrenzmann

  • description changed.
  • summary changed from Kernel oops during scan timer to Kernel oops during scan timer (XScale IXP425 BE).

The preview button is there for a reason...

12/27/06 08:20:35 changed by rozteck@interia.pl

I'm having the same problem on AirTegrity? 3100 board. The kernel is 2.6.18-rt7. It mostly happends on cards running in 802.11a mode (on 802.11g the bug seems to not appear or not appear so often).

12/27/06 20:44:25 changed by roee

Indeed, I'm running in 802.11a mode also. Haven't tried 802.11g .

04/26/07 20:58:11 changed by mike.taylor@apprion.com

/begin voodoo

I defined AH_REGOPS_FUNC and then disabled interrupts during all register accesses (by wrapping the entire body of ath_hal_reg_read and ath_hal_reg_write with irq_local_save and local_irq_restore).

It is not clear to me WHY this should be necessary but it appeared to me that we were in an interrupt handler and getting pre-empted for another, whereupon we got this 'bad mode' business.

/end voodoo

If anyone can shed some light I'd love to learn more about what is going on. In the meantime, the workaround is good.

04/27/07 19:54:33 changed by mentor

Hmmm, raw_{read,write}{b,w,l} don't provide memory barriers. This may pose a problem, especially on architectures that use MMIO, even more especially if it is relaxed MMIO.

04/29/07 21:47:21 changed by mentor

  • attachment madwifi-mb.diff added.

Memory Barriers for IO Access

04/29/07 21:48:04 changed by mentor

Have a go with this patch?

05/03/07 16:02:57 changed by anders.lundstrom

  • attachment madwifi-mb2.diff added.

Memory Barriers for IO Access 2

05/03/07 16:05:26 changed by anders.lundstrom

The patch supplied by mentor seems like it has a typo.

Initial tests of the corrected patch seems to work on the XScale IXP420/IXP425 platforms.

05/22/07 01:08:04 changed by mtaylor

Even with the madwifi-mb2.diff patch, I still got crashes during channel changes in ath_hal_reset. I'm on arm xscale be, with preemption enabled.

I'm going to attach a patch to disable interrupts during registry reads/writes that fixes the kernel panics on my board. The memory barrier seems to have fixed some of the crashes but not all. ath_init calls ath_hal_reset, which calls ath_hal_reg_read, kaboom!

05/22/07 01:11:04 changed by mtaylor

  • attachment madwifi-0.9.3-reg.diff added.

Signed-off-by: Mike Taylor (mike.taylor@apprion.com)

05/22/07 01:12:46 changed by mtaylor

Signed-off-by: Mike Taylor (mike.taylor@apprion.com)

05/22/07 01:24:13 changed by mtaylor

This patch (madwifi-0.9.3-reg.diff) doesn't crash but it seems like interrupts are dropped rather than disabled with it, so you get a bug warning from debug kernels but the kernel doesn't come to a screeching halt. I'm still trying to figure out how to get this to work clean.

05/22/07 01:31:44 changed by mentor

Maybe we should be grabbing ATH_LOCK? Maybe ATH_LOCK should be IRQ protected...?

05/22/07 23:15:13 changed by mtaylor

  • patch_attached set to 1.

Interestingly I found that I had to disable preemption and use preempt_enable_no_resched() instead of preempt_enable() in order to avoid the issue. With this next version of the patch, I disable IRQ and preemption (which also has memory barriers by the way).

This version never crashed on me in the last few months. I tried to remove the IRQ and preempt lines and try it with just the memory barriers but that still oops'd the kernel. When I re-added the disabling of IRQs without the preempt disable I got the unhandled IRQ errors. When I add both, but use preempt_enable() (which causes scheduling) it ooops'd the kernel. When I used preempt_enable_no_resched() it works fine.

I'm not exactly sure whether this is the right place to put these calls, but if the bug is in the HAL it's the only way. If the bug is in the calling code, then we have to chase down each case individually which seems painful at best.

05/22/07 23:15:58 changed by mtaylor

  • attachment madwifi-0.9.3-reg.2.diff added.

Stable version of the patch

05/22/07 23:17:43 changed by mtaylor

  • attachment madwifi-0.9.3-reg.3.diff added.

Correct version. Ignore madwifi-0.9.3-reg.2.diff.

05/23/07 00:24:25 changed by mentor

Looking at this patch, it feels to me like we should be using a lock...

05/30/07 23:21:59 changed by mtaylor

  • status changed from new to closed.
  • resolution set to fixed.

The issue is resolved in r2405 after mentor's locking repairs.

05/31/07 00:46:56 changed by mtaylor

  • status changed from closed to reopened.
  • resolution deleted.

spoke too soon. It's less frequent but it's still reproducible when I destroy and create VAPs quickly.

06/21/07 21:02:29 changed by mtaylor

  • status changed from reopened to closed.
  • resolution set to fixed.

This should be fixed in trunk with the HAL lock added.

06/27/07 12:12:50 changed by mrenzmann

  • milestone set to version 0.9.4.

02/11/08 06:13:38 changed by mrenzmann

  • milestone changed from version 0.9.4 to version 0.9.5.

01/04/11 16:18:41 changed by anonymous

Hi When is the expected release date of version 0.9.5

I am looking for fix for #1049 and it appears that this got fixed in trunk. I am using Linux 2.6.17 and ixp425 platform, can i assume latest trunk source is stable for this platform and kernel version.

Thanks&Regards Surendra