Please note: This project is no longer active. The website is kept online for historic purposes only.
If you´re looking for a Linux driver for your Atheros WLAN device, you should continue here .

Ticket #593 (new defect)

Opened 16 years ago

Last modified 10 years ago

Kernel Oops after "ifconfig ath0 up"

Reported by: himself@raphael-susewind.de Assigned to:
Priority: major Milestone:
Component: madwifi: driver Version: trunk
Keywords: oops crash irq Cc:
Patch is attached: 0 Pending:

Description

I constantly get a kernel oops with my TP-Link D510 wireless card (oops attached). Otherwise my system (HP Omnibook 800CT, Kernel 2.6.16.1, Kernel config and startup-dmesg attached) is perfectly stable. My Xircom network card (it is 16bit pcmcia, maybe this counts?) works perfectly as well.

When I just insert the wireless card, it says in the kernel log:

  ath_hal: module license 'Proprietary' taints kernel.
  ath_hal: 0.9.16.16 (AR5210, AR5211, AR5212, RF5111, RF5112, RF2413, RF5413)
  wlan: 0.8.4.2 (svn 1486)
  ath_rate_sample: 1.2 (svn 1486)
  ath_pci: 0.9.4.5 (svn 1486)
  PCI: Enabling device 0000:02:00.0 (0000 -> 0002)
  wifi0: 11b rates: 1Mbps 2Mbps 5.5Mbps 11Mbps
  wifi0: 11g rates: 1Mbps 2Mbps 5.5Mbps 11Mbps 6Mbps 9Mbps 12Mbps 18Mbps
  24Mbps s
  wifi0: H/W encryption support: WEP AES AES_CCM TKIP
  wifi0: mac 7.8 phy 4.5 radio 5.6
  wifi0: Use hw queue 1 for WME_AC_BE traffic
  wifi0: Use hw queue 0 for WME_AC_BK traffic
  wifi0: Use hw queue 2 for WME_AC_VI traffic
  wifi0: Use hw queue 3 for WME_AC_VO traffic
  wifi0: Use hw queue 8 for CAB traffic
  wifi0: Use hw queue 9 for beacons
  wifi0: Atheros 5212: mem=0x12000000, irq=9

The system remains durably stable at this step. But after invoking "ifconfig ath0 up" it panics either immediately or after a few seconds (once after 2 minutes or so). Sometimes there is enough time to run a "iwlist ath0 scan" - it shows all APs in range. Once I also managed to associate for a few seconds before the freeze... You can find the oops attached to this email. I think its something about IRQs (btw: there are no IRQ conflicts, I checked this point, of course...). It makes no difference if I include the "spinlock patch" (ticket #472) or not. Here is the oops:

Unable to handle kernel paging request at virtual address 2b400014
 printing eip:
*pde = 00000000
Oops: 0000 [#1]
Modules linked in: serial_cs xirc2ps_cs pcmcia crc32 wlan_scan_sta ath_pci
athe
CPU:    0
EIP:    0060:[<c592447c>]    Tainted: P      VLI
EFLAGS: 00010202   (2.6.16.1 #1)
EIP is at zz005b88fd+0x20/0x130 [ath_hal]
eax: 2b400000   ebx: c4ee10f0   ecx: 0000000f   edx: c4ee10f0
esi: c14ab720   edi: c4e08000   ebp: c4492260   esp: c02d9eec
ds: 007b   es: 007b   ss: 0068
Process swapper (pid: 0, threadinfo=c02d8000 task=c029cb00)
Stack: <0>c47888f0 c14ab720 c4ee10f0 c59609e1 c4e08000 c4ee10f0 04ee10f0 2b400
       08fa4a04 00000000 00000282 08fa4a04 00000000 00000282 c44937ec c4e08000
       c4e08000 00000000 c4492260 00000000 c4e08000 c5961008 c4492260 c02d9f64
Call Trace:
 [<c59609e1>] ath_uapsd_processtriggers+0xb1/0x4c0 [ath_pci]
 [<c5961008>] ath_intr+0x218/0x300 [ath_pci]
 [<c012aef8>] handle_IRQ_event+0x28/0x70
 [<c012af97>] __do_IRQ+0x57/0xa0
 [<c010415a>] do_IRQ+0x1a/0x30
 [<c0102aca>] common_interrupt+0x1a/0x20
 [<c0100ba3>] default_idle+0x33/0x60
 [<c010987b>] apm_cpu_idle+0x8b/0x150
 [<c0100c3c>] cpu_idle+0x4c/0x60
 [<c02da6b6>] start_kernel+0x146/0x160
Code: 10 00 00 00 00 b8 01 00 00 00 c3 57 56 53 8b 7c 24 10 8b 5c 24 14 89 da
 <0>Kernel panic - not syncing: Fatal exception in interrupt

I am struggling with this error since weeks, tried several revisions of madwifi and madwifi-ng and all thinkable combinations of kernel parameteres regarding the PCMCIA socket, PCI, PNP and APM stuff...

Attachments

oops-1538 (1.3 kB) - added by himself@raphael-susewind.de on 05/05/06 18:27:43.
Oops from r1538
dmesg-1538 (7.1 kB) - added by himself@raphael-susewind.de on 05/05/06 18:28:13.
Dmesg from r1538
dmesg-r1605-hal-0.9.17.0 (0.8 kB) - added by himself@raphael-susewind.de on 05/27/06 11:03:14.
oops-r1605-hal-0.9.17.0 (1.7 kB) - added by himself@raphael-susewind.de on 05/27/06 11:03:27.

Change History

05/04/06 18:26:36 changed by dyqith

You're using a very old version, please get the latest from svn.

http://madwifi.org/wiki/UserDocs/GettingMadwifi

Report back, when you get it fixed and I'll close this ticket then. thanks.

05/04/06 19:09:42 changed by himself@raphael-susewind.de

Sorry, I accidently attached a older dmesg - I recently tried with Revision 1531, with the same oops in ath_uapsd_processtriggers. I also tried to start a discussion on this topiv on madwifi-devel some days ago, but I think this Ticket-Solution is better suited... Is there a significant change from 1531 to 1538? Then I can try this newest version if you suggest...

05/04/06 20:13:34 changed by dyqith

Yup, there's a change in 1534 that should solve more kernel panics...

05/05/06 11:56:15 changed by himself@raphael-susewind.de

Tried 1538 today - still the same oops...

05/05/06 16:37:39 changed by dyqith

Okay, can you give the new dmesg of the module loading. And also provide the kernel oops. thanks.

05/05/06 18:27:43 changed by himself@raphael-susewind.de

  • attachment oops-1538 added.

Oops from r1538

05/05/06 18:28:13 changed by himself@raphael-susewind.de

  • attachment dmesg-1538 added.

Dmesg from r1538

05/09/06 19:19:17 changed by himself@raphael-susewind.de

Still present whith 1543 as well...

05/16/06 18:50:24 changed by himself@raphael-susewind.de

...still present in r1552... (not to tease, just to keep the ticket up-to-date)

05/22/06 23:10:30 changed by himself@raphael-susewind.de

...still present in r1589...

05/26/06 16:40:18 changed by mentor

Right, I've had a look at the oops, and the code and question. Given that the oops occurs in the HAL, the callee references in the traces, and the type of the oops, I think it is quite likely that this is being caused by address translations, and the call at ath/if_ath.c:1372 to ath_hal_rxprocdesc().

Sadly this is way beyond my kernel-fu at the moment, so I haven't got a chance of fixing it. This should probably be bumped to Sam (Leffler).

05/26/06 17:37:07 changed by Mister_X

Maybe latest HAL will solve this (0.9.17.0): http://people.freebsd.org/~sam/ath_hal-20060506.tgz

05/27/06 11:02:39 changed by himself@raphael-susewind.de

Thanks to Sam Leffler to provide a new HAL - i tried r1605 with hal 0.9.17.0. It worked for 2 minutes, allowing me to successfully scan the wireless environment and configure ath0, but it seems that after a certain amount of irq-stressing it dies. Short after inserting the card, there is the message "wifi0: ath_chan_set: unable to reset channel 6 (2437Mhz) flags 0xc0 'Hardware didn't respond as expected' (HAL status 3)". Then, as I said, two minutes work, than oops. I included dmesg and oops as files...

05/27/06 11:03:14 changed by himself@raphael-susewind.de

  • attachment dmesg-r1605-hal-0.9.17.0 added.

05/27/06 11:03:27 changed by himself@raphael-susewind.de

  • attachment oops-r1605-hal-0.9.17.0 added.

06/08/06 13:33:08 changed by emildi@gmail.com

Hello,

I hit the exact same Oops with my HP Omnibook 800CT. The wifi device is 3Com 3CRPAG175B, the kernel is the default 2.4.31. The driver I used was madwifi-ng-r1629-20060607.tar.gz. I'll try with the new HAL (http://people.freebsd.org/~sam/ath_hal-20060531.tgz) and I'll report my results.

Regards, Emil

06/08/06 13:36:51 changed by mrenzmann

The new HAL (v0.9.17.2) has been committed in r1631.

06/08/06 13:45:51 changed by anonymous

Thanks for the advise - I'll try with the latest r1634. Just one clarification - the "default 2.4.31" kernel in my previous update means the default 2.4.31 kernel as included in Slackware 10.2.

06/09/06 16:07:20 changed by emildi@gmail.com

Hello again,

I tried r1634 ( HAL v0.9.17.2)with both 2.4.31 and 2.6.13 (as included in Slackawe 10.2), but always got the Kernel Oops. I'll have to configure the machine to record those dumps and take a more careful look at it, but from what I had on the screen it relly look like a problem in the HAL. Is there a different HAL specific Bugzilla system, where I could raise the problem?

Thanks, Emil

06/13/06 07:59:35 changed by himself@raphael-susewind.de

Hello Emil,

can you e-mail me your dmesg, lspci -vv and kernel-config for comparison? Now that we have the same notebook we can better check if there is a configuration problem... Have you ever tried to run another cardbus card in your omnibook? unfortunately I only tried PCMCIA (I have no other cardbus)...

Regards, Raphael

06/13/06 18:51:47 changed by emildi@gmail.com

Hello Raphael,

I'll send you the requested information, most probably tomorrow. I'm having 2 other PC Card I use without any problems and I've tried them on both slots - the one is a noname 4 port USB extention card and the other one is a 10/100 Ethernet Card (don't remeber any details right now).

Regards, Emil

06/13/06 18:56:29 changed by emildi@gmail.com

(just for the record)

I've tried the wifi card under Windows XP (with the drivers from the 3Com CD) and under Knoppix 4.0.2 (kernel 2.6.12.4) with the r1634 drivers and everything looks fine. The laptop I used was Athlon64 based (TARGA Traveller 811 w730-k8). So definitely it's not a wifi card hardware problem.

Emil

07/09/06 08:38:19 changed by michael.white@charter.net

I noticed a new patch file on http://people.freebsd.org/~sam/tx80211/. Seems to have a lot of changes. Has anyone tried this patch? I wouldn't mind trying the patch if I new the version he was patching...

Thanks, Michael

07/09/06 17:40:21 changed by himself@raphael-susewind.de

I did try r1680 today - same error...

08/12/06 15:27:32 changed by himself@raphael-susewind.de

Still there with r1704

09/07/06 18:04:00 changed by himself@raphael-susewind.de

...and with r1708...

09/07/06 19:04:24 changed by emildi@gmail.com

I'm thinking of trying "ndiswrapper" as a temporary workaround for this. Of course the right way to do is to find what's causing the Oops.

Emil

05/12/07 17:21:11 changed by anonymous

Hello,

I have the same error (on an Omnibook 800CT too, with AR5006X cardbus adapter) with latest svn r2325. This bug seems to be pretty old now, did anybody figure out how to fix it? I'd love to but I'm not into kernel-programming... ndiswrapper seem to crash sometimes, too, so it would be really nice to get it working with madwifi.

regards,

Christian

05/13/07 02:48:03 changed by chris-ware@web.de

Hello again,

I debugged it a bit the last hours (try'n'error, a lot of reboots with all the oopsing), and found out (as mentor mentioned earlier) that the problem seems to be in

if_ath.c:1402 retval = ath_hal_rxprocdesc(ah, ds, bf->bf_daddr, PA2DESC(sc, ds->ds_link), tsf);

or to be more specific, that somehow a totally wrong ds->ds_link value is passed to the macro, resulting in a wrong memory location beeing passed to ath_hal_rxprocdesc().

Unfortunately, I could not figure out when and why this happens, it just happens, normally within seconds after ifconfig ath0 up... As I don't fully understand the driver yet, I could not figure out how to fix those wrong values beeing generated. Thus, I made a little workaround - basing upon the fact that all valid ds->ds_link were relatively near to a common base address (see the macro PA2DESC), I just created a little piece of code which would ignore all ds->ds_link which are too far offset (I chose 0xfff for convenience), resulting in the following diff: (Use on your own risk)

1402,1403c1402,1419
<               retval = ath_hal_rxprocdesc(ah, ds, bf->bf_daddr, PA2DESC(sc, ds->ds_link), tsf);
<               if (HAL_EINPROGRESS == retval)
---
> 
> /*
> DPRINTF(sc, ATH_DEBUG_UAPSD, "SDB a:%x + ( b:%x - c:%x ) res:%x\n",
> (unsigned int)(sc)->sc_rxdma.dd_desc ,
> (unsigned int)ds->ds_link ,
> (unsigned int)(sc)->sc_rxdma.dd_desc_paddr ,
> (unsigned int)(PA2DESC(sc, ds->ds_link)));
> */
>               if((unsigned int)(ds->ds_link - (sc)->sc_rxdma.dd_desc_paddr) < 0xfff) {
>                       retval = ath_hal_rxprocdesc(ah, ds, bf->bf_daddr, PA2DESC(sc, ds->ds_link), tsf);
>                       if (HAL_EINPROGRESS == retval)
>                               break;
>               } else {
>                       DPRINTF(sc, ATH_DEBUG_UAPSD, "POSSIBLE ANOMALY SDB a:%x + ( b:%x - c:%x ) res:%x\n",
> (unsigned int)(sc)->sc_rxdma.dd_desc ,
> (unsigned int)ds->ds_link ,
> (unsigned int)(sc)->sc_rxdma.dd_desc_paddr ,
> (unsigned int)(PA2DESC(sc, ds->ds_link)));
1404a1421
>               }

As one can easily see, the snippet will warn anomalous ds->ds_link values on debuglevel uapsd.

Note, I don't claim this being a fix, I've no idea what's really wrong, I just found out that the oopsing can be prevented by that. The card works relatively normal with this code, beside the earlier mentioned HAL errors and a bit degraded performance. The card reseted once, I hope it's not somehow destructive to the hardware ;)

I hope with these informations a madwifi developer might look for the real problem, I'll try too, but as I started analyzing the madwifi-source today, this might take some time ;)

I hope this bug can be resolved now...

regards, Christian

12/06/07 11:00:57 changed by Bjoern

This still occurs as of SVN r3007, Debian etch with 2.6.18-5, x86. I'll adapt your patch and see if it helps any.

I don't have any concrete examples yet, but I suspect this error occurs on some - but not all - countrycode/regdomain combinations.

12/07/07 11:16:14 changed by Bjoern

Unfortunately, the above patch did not help prevent the crash. Any ideas?

12/07/07 18:52:58 changed by mentor

  • summary changed from Kernel Oops after "ifconfig ath0 up" (madwifi-ng r1531 and previous) to Kernel Oops after "ifconfig ath0 up".