Please note: This project is no longer active. The website is kept online for historic purposes only.
If you´re looking for a Linux driver for your Atheros WLAN device, you should continue here .

Ticket #1516 (new defect)

Opened 14 years ago

Last modified 14 years ago

Changing rate control on STATION crashes AP, possible DoS on APs

Reported by: spam@softmach.com Assigned to:
Priority: major Milestone:
Component: madwifi: other Version: v0.9.3.2
Keywords: station crashes AP DoS Cc:
Patch is attached: 0 Pending:

Description

Using madwifi release 0.9.3.2 on Linux 2.6.20.4, with one node in STATION mode, and the other node in AP mode, when the STATION node is torn down, and then immediately brought back up using a different rate control, the AP node panics.

Using this flaw, someone could construct a DoS attack against nodes using madwifi.

The two nodes (in WDS mode) are bridging ethernet pings when the crash happens. It can also crash without the pings, but it crashes more often (about 25%) when traffic is present.

Switching the STATION from amrr or onoe to sample seems the best way to see a crash.

Here is the AP OOPS when the STATION goes from amrr to sample:

ath_rate_sample: no rates for 00:15:6d:53:08:68?
ath_rate_sample: no rates for 00:15:6d:53:08:68?
BUG: unable to handle kernel NULL pointer dereference at virtual address 000000bf
 printing eip:
c802a6aa
*pde = 00000000
Oops: 0000 [#1]
Modules linked in: wlan_wep wlan_scan_ap ath_rate_sample ath_pci wlan ath_hal(P)
CPU:    0
EIP:    0060:[<c802a6aa>]    Tainted: P      VLI
EFLAGS: 00010086   (2.6.20.4 #14)
EIP is at _ieee80211_free_node+0x1a/0xf0 [wlan]
eax: c12e3000   ebx: c12e3000   ecx: 00000000   edx: 00000001
esi: c76aeece   edi: 00000000   ebp: 00000000   esp: c03d9e28
ds: 007b   es: 007b   ss: 0068
Process swapper (pid: 0, ti=c03d8000 task=c03b1380 task.ti=c03d8000)
Stack: c11ba260 00000002 00000000 c11ba260 c80a0793 c7200000 c11c2000 00000060 
       c76aeec0 c76aeece c12e203e c12e2038 c802ae68 00000206 c12e2038 c12e3000 
       c11babe8 c7264260 c8025818 00000000 00000003 c8080ce0 00000001 01240f00 
Call Trace:
 [<c80a0793>] ath_beacon_setup+0x213/0x2d0 [ath_pci]
 [<c802ae68>] ieee80211_remove_wds_addr+0x58/0x80 [wlan]
 [<c8025818>] ieee80211_input+0x18d8/0x1920 [wlan]
 [<c80540f9>] zz067d0c47+0x15/0x5c [ath_hal]
 [<c80a7088>] ath_rx_tasklet+0x658/0x800 [ath_pci]
 [<c8063780>] zz005b88fd+0x0/0x13c [ath_hal]
 [<c80a6a78>] ath_rx_tasklet+0x48/0x800 [ath_pci]
 [<c0116073>] tasklet_action+0x33/0x70
 [<c0115fc2>] __do_softirq+0x42/0x90
 [<c0116037>] do_softirq+0x27/0x30
 [<c0104a32>] do_IRQ+0x42/0x70
 [<c0104a32>] do_IRQ+0x42/0x70
 [<c0100340>] init+0x0/0x280
 [<c0102e9b>] common_interrupt+0x23/0x28
 [<c0101a60>] default_idle+0x0/0x40
 [<c0101a8a>] default_idle+0x2a/0x40
 [<c010114c>] cpu_idle+0x1c/0x50
 [<c03da6f1>] start_kernel+0x271/0x2f0
 [<c03da230>] unknown_bootoption+0x0/0x250
 =======================
Code: 5f 5d c3 8d b4 26 00 00 00 00 8d bc 27 00 00 00 00 83 ec 30 89 5c 24 20 89 c3 89 74 24 24 89 7c 24 28 89 6c 24 2c 8b 38 8b 68 08 <f6> 87 bf 00 00 00 01 74 50 85 ed 8b 70 1c c7 44 24 1c 85 37 04 
EIP: [<c802a6aa>] _ieee80211_free_node+0x1a/0xf0 [wlan] SS:ESP 0068:c03d9e28
 <0>Kernel panic - not syncing: Fatal exception in interrupt

Here is the script used to tear down the STATION:

#!/bin/sh
ifconfig br0 down
brctl delbr br0
ifconfig ath0 down
ifconfig eth2 down
wlanconfig ath0 destroy
/usr/local/bin/madwifi-unload
/sbin/ifconfig eth2 inet 0.0.0.0 down

Here is the script used to bring the STATION back up using a different rate control (change the ratectl=string):

#!/bin/sh
echo 1 > /proc/sys/net/ipv4/ip_forward
/sbin/modprobe ath_pci rfkill=0 ratectl=sample autocreate=none
/usr/local/bin/wlanconfig ath0 create wlandev wifi0 wlanmode ap
athctrl -i wifi0 -d 20000
sysctl -w dev.wifi0.diversity=0
sysctl -w dev.wifi0.rxantenna=1
sysctl -w dev.wifi0.txantenna=1
iwconfig ath0 essid "meshtest"
iwconfig ath0 channel 6
iwconfig ath0 rate auto
iwpriv ath0 mode 3
iwconfig ath0 key aaaaaaaaaaaaaaaaaaaa
iwpriv ath0 authmode 1
iwpriv ath0 maccmd 3
iwconfig ath0 frag 2346
iwpriv ath0 xr 0
iwpriv ath0 turbo 0
iwpriv ath0 bintval 100
iwpriv ath0 dtim_period 2
iwpriv ath0 bgscan 0
iwpriv ath0 ff 0
iwpriv ath0 turbo 0
iwpriv ath0 protmode 0     
iwpriv ath0 wmm 0
iwpriv ath0 wds 1
ifconfig eth2 0.0.0.0
brctl addbr br0
brctl addif br0 eth2
brctl addif br0 ath0
brctl setfd br0 1
brctl stp br0 off
ifconfig br0 192.168.241.21 netmask 255.255.255.0 up
ifconfig ath0 up
ifconfig eth2 up
iwconfig ath0 txpower auto

before the unit crashes, here are the loaded modules:

# lsmod
Module                  Size  Used by    Tainted: P  
wlan_wep                5600  1 
wlan_scan_ap            4384  0 
ath_rate_amrr           6084  1 
ath_pci                86496  0 
wlan                  189700  5 wlan_wep,wlan_scan_ap,ath_rate_amrr,ath_pci
ath_hal               189328  2 ath_pci

The crash happens with open WEP and with unencrypted connections.

The hardware platform is a Geode GX-2 GX466 (basically a low power 686), similar to some Soekris boards.

Change History

08/21/07 23:36:07 changed by spam@softmach.com

I added a 30 sleep to my STATION tear down script - the AP still crashes, so the crash doesn't depend on the switch being very fast.

08/22/07 01:01:10 changed by spam@softmach.com

The AP crashes even if the STATION waits 300 seconds (5 minutes) between the first session and the 2nd session.

This time the crash is a bit different:

BUG: unable to handle kernel NULL pointer dereference at virtual address 00000000
 printing eip:
c803129c
*pde = 00000000
Oops: 0000 [#1]
Modules linked in: wlan_scan_ap ath_rate_amrr ath_pci wlan ath_hal(P)
CPU:    0
EIP:    0060:[<c803129c>]    Tainted: P      VLI
EFLAGS: 00010013   (2.6.20.4 #14)
EIP is at ieee80211_node_saveq_drain+0x1c/0x80 [wlan]
eax: c12d3c00   ebx: 00000000   ecx: c11b2260   edx: 00000092
esi: c12d3c00   edi: c12d3dcc   ebp: 0d003300   esp: c03d9dd0
ds: 007b   es: 007b   ss: 0068
Process swapper (pid: 0, ti=c03d8000 task=c03b1380 task.ti=c03d8000)
Stack: c12d3c00 c727e260 c11b2260 00000000 c802babc 00000000 000510e9 00000086 
       00000001 00000046 c12d3c00 c80a63a9 c12d3c00 c12d3c00 c727e260 00000000 
       c802ae9c c11b2260 c809d58e c12d3c00 c70ae5e0 c802a75f c11b2260 00000002 
Call Trace:
 [<c802babc>] node_cleanup+0x4c/0x110 [wlan]
 [<c80a63a9>] ath_node_cleanup+0x139/0x140 [ath_pci]
 [<c802ae9c>] node_free+0xc/0x50 [wlan]
 [<c809d58e>] ath_node_free+0x2e/0x40 [ath_pci]
 [<c802a75f>] _ieee80211_free_node+0xcf/0xf0 [wlan]
 [<c80a0793>] ath_beacon_setup+0x213/0x2d0 [ath_pci]
 [<c802ae68>] ieee80211_remove_wds_addr+0x58/0x80 [wlan]
 [<c8025818>] ieee80211_input+0x18d8/0x1920 [wlan]
 [<c80540f9>] zz067d0c47+0x15/0x5c [ath_hal]
 [<c80a9960>] ath_intr+0x4f0/0xc00 [ath_pci]
 [<c80a6f07>] ath_rx_tasklet+0x4d7/0x800 [ath_pci]
 [<c8063780>] zz005b88fd+0x0/0x13c [ath_hal]
 [<c80a6a78>] ath_rx_tasklet+0x48/0x800 [ath_pci]
 [<c0116073>] tasklet_action+0x33/0x70
 [<c0115fc2>] __do_softirq+0x42/0x90
 [<c0116037>] do_softirq+0x27/0x30
 [<c0104a32>] do_IRQ+0x42/0x70
 [<c0104a32>] do_IRQ+0x42/0x70
 [<c0100340>] init+0x0/0x280
 [<c0102e9b>] common_interrupt+0x23/0x28
 [<c0101a60>] default_idle+0x0/0x40
 [<c0101a8a>] default_idle+0x2a/0x40
 [<c010114c>] cpu_idle+0x1c/0x50
 [<c03da6f1>] start_kernel+0x271/0x2f0
 [<c03da230>] unknown_bootoption+0x0/0x250
 =======================
Code: ff ff e9 1d ff ff ff 90 8d b4 26 00 00 00 00 55 57 56 89 c6 53 8b 98 cc 01 00 00 8d b8 cc 01 00 00 8b a8 d4 01 00 00 39 fb 74 53 <8b> 13 8d 45 ff 89 86 d4 01 00 00 89 96 cc 01 00 00 89 7a 04 c7 
EIP: [<c803129c>] ieee80211_node_saveq_drain+0x1c/0x80 [wlan] SS:ESP 0068:c03d9dd0
 <0>Kernel panic - not syncing: Fatal exception in interrupt
 

09/06/07 23:47:01 changed by thunder.m

Can you try it again without patch from ticket 1388?

(follow-up: ↓ 5 ) 09/08/07 14:05:59 changed by thunder.m

After more investigation it happens even without patch from ticket 1388, even with previous stable version 0.9.3. It can be reproducable almost with 100% probability. So everyone who has AP in WDS mode with madwifi stable can be attacked. This is realy serious bug!

I can't reproduce it with latest trunk r2702 because client to ap can't associate and connection is not working, but with previous trunk r2695 it is working without problems, so there is no crash anymore.

(in reply to: ↑ 4 ) 09/12/07 18:16:58 changed by spam@softmach.com

Replying to thunder.m:

After more investigation it happens even without patch from ticket 1388, even with previous stable version 0.9.3. It can be reproducable almost with 100% probability. So everyone who has AP in WDS mode with madwifi stable can be attacked. This is realy serious bug! I can't reproduce it with latest trunk r2702 because client to ap can't associate and connection is not working, but with previous trunk r2695 it is working without problems, so there is no crash anymore.

Yes,

I found too, that recent versions of trunk did not have the problem, but the "stable" 0.9.3.2 release DOES have this problem. (A script to cause APs to oops could be crafted very easily.)

My suspicion is that the 80211 code has the AP leave some info about the station in a table somewhere when the station disappears or disassociates, and when the station re-associates, the table is not updated properly to reflect the fact that the station has changed (e.g rate control changed), and that later on, due to the saved state being incorrect, the AP crashes because of the stale info.

I tried hacking at the code in 80211 to fix this (prevent ath_node_free from being called when the pointer is null), but the bug just morphed, and it turned into whack-a-mole. I wan't fixing the right stuff, wasn't fixing it right, or I wasn't fixing all of it. :-( I bailed out of trying to fix it myself. But maybe these bread crumbs may be helpful to someone who knows the terrain of this project better?

02/07/08 04:34:16 changed by mtaylor

Please see if this was fixed in trunk. A number of leaks and double-free calls have been fixed since r3303.