Please note: This project is no longer active. The website is kept online for historic purposes only.
If you´re looking for a Linux driver for your Atheros WLAN device, you should continue here .

Ticket #843 (new defect)

Opened 15 years ago

Last modified 15 years ago

four extra bytes between 802.11 and LLC headers in frames recv'd by AR5006EXS

Reported by: jrengdahl@hotmail.com Assigned to:
Priority: major Milestone:
Component: madwifi: 802.11 stack Version: v0.9.2
Keywords: AR5006EXS PCI Express Cc:
Patch is attached: 1 Pending:

Description

This may be a manifestation of the problem reported in ticket #784.

I have a Dell Latitude D620 laptop with a mini-PCI Express card: PCDGlobal's implementation of the XB62 reference design for the AR5006EXS aka AR5424.

The node comes up and associates. When I try to ping another node, using Ethereal on a separate monitor machine I can see the XB62 going through the proper steps get a MAC address via ARP. I can see the ARP replies on Ethereal, but the XB62 never transmits the ping request, instead it tries ARPing two more times then the ping command times out. When I run Ethereal on the XB62 node in parallel with the STA, or use "athdebug +recv" to dump the incoming packets, I see there are an extra four bytes between the 802.11 header and the LLC header. Ethereal cannot decode these bytes. Here is a hex dump of an ARP request and the reply. I edited and interleaved the lines of the two hex dumps for easy comparison. The top line of each pair is the incoming frame as reported by "athdebug +recv" in dmesg. The second line is what Ethereal reports on the other machine. The other machine uses a Atheros-based Senao NMP-8602+ and works fine. FWIW, the access point is a Linksys WAP54GP.

This is an echo of my arp request relayed by the AP to all, I get to see it also

FRDS 00:16:e3:36:17:eb->ff:ff:ff:ff:ff:ff(00:14:bf:7e:3e:3c) data 1M +68

08 02 3a 01 ff ff ff ff  ff ff 00 14 bf 7e 3e 3c  <-- "athdebug +recv" in dmesg
08 02 3a 01 ff ff ff ff  ff ff 00 14 bf 7e 3e 3c  <-- Ethereal on other machine

00 16 e3 36 17 eb 40 37  00 40 70 00 aa aa 03 00  <-- dmesg reports extra 4 bytes
00 16 e3 36 17 eb 40 37              aa aa 03 00  <-- not seen by other machine

00 00 08 06 00 01 08 00  06 04 00 01 00 16 e3 36
00 00 08 06 00 01 08 00  06 04 00 01 00 16 e3 36

17 eb c0 a8 01 06 00 00  00 00 00 00 c0 a8 01 03
17 eb c0 a8 01 06 00 00  00 00 00 00 c0 a8 01 03


This is the ARP response forwarded by the AP to me

FRDS 00:02:6f:40:77:1c->00:16:e3:36:17:eb(00:14:bf:7e:3e:3c) data 54M +65

08 02 2c 00 00 16 e3 36  17 eb 00 14 bf 7e 3e 3c
08 02 2c 00 00 16 e3 36  17 eb 00 14 bf 7e 3e 3c

00 02 6f 40 77 1c 50 37  84 40 70 00 aa aa 03 00
00 02 6f 40 77 1c 50 37              aa aa 03 00

00 00 08 06 00 01 08 00  06 04 00 02 00 02 6f 40
00 00 08 06 00 01 08 00  06 04 00 02 00 02 6f 40

77 1c c0 a8 01 03 00 16  e3 36 17 eb c0 a8 01 06
77 1c c0 a8 01 03 00 16  e3 36 17 eb c0 a8 01 06

As Codestrom reported, Windows XP also fails to communicate on this hardware platform until Super A/G mode is disabled via the ACU utility, after which it works as well as Windows ever works.

I tried "iwpriv ath0 turbo 0" but that did not help.

I rather bludgeonly added a "+4" to ieee80211_input.c to skip over the four bytes:

static struct sk_buff *
ieee80211_decap(struct ieee80211vap *vap, struct sk_buff *skb, int hdrlen)
{
	struct ieee80211_qosframe_addr4 wh;	/* Max size address frames */
	struct ether_header *eh;
	struct llc *llc;
	u_short ether_type = 0;
	
	memcpy(&wh, skb->data, hdrlen);	/* Only copy hdrlen over */
	printk("ieee80211-decap: skipping extra four bytes\n"); // complain
	llc = (struct llc *) skb_pull(skb, hdrlen+4);	// skip LLC plus extra bytes
	if (skb->len >= LLC_SNAPFRAMELEN &&
	    llc->llc_dsap == LLC_SNAP_LSAP && llc->llc_ssap == LLC_SNAP_LSAP &&
	    llc->llc_control == LLC_UI && llc->llc_snap.org_code[0] == 0 &&
	    llc->llc_snap.org_code[1] == 0 && llc->llc_snap.org_code[2] == 0) {
		ether_type = llc->llc_un.type_snap.ether_type;
		skb_pull(skb, LLC_SNAPFRAMELEN);
		llc = NULL;
	}

Compiled and installed it, and ran the following test script:

dmesg -c >/dev/null
athdebug +recv
ping -c1 192.168.1.245
athdebug -recv
dmesg

The ARP and ping now work. Here are the results:

#sh -v ./test
dmesg -c >/dev/null
athdebug +recv
dev.wifi0.debug: 0x00000000 => 0x00000004<recv>
ping -c1 192.168.1.245
PING 192.168.1.245 (192.168.1.245) 56(84) bytes of data.
64 bytes from 192.168.1.245: icmp_seq=1 ttl=64 time=4.60 ms

--- 192.168.1.245 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 4.605/4.605/4.605/0.000 ms
athdebug -recv
dev.wifi0.debug: 0x00000004 => 0x00000000
dmesg
FRDS 00:16:e3:36:17:eb->ff:ff:ff:ff:ff:ff(00:14:bf:7e:3e:3c) data 1M +67

08 02 3a 01 ff ff ff ff  ff ff 00 14 bf 7e 3e 3c <-- downlink echo of my ARP request
00 16 e3 36 17 eb 20 3b  00 40 70 00 aa aa 03 00
00 00 08 06 00 01 08 00  06 04 00 01 00 16 e3 36
17 eb c0 a8 01 06 00 00  00 00 00 00 c0 a8 01 f5

FRDS 00:14:bf:7e:3e:3d->00:16:e3:36:17:eb(00:14:bf:7e:3e:3c) data 54M +63

08 02 2c 00 00 16 e3 36  17 eb 00 14 bf 7e 3e 3c <-- ARP reply
00 14 bf 7e 3e 3d 30 3b  84 40 70 00 aa aa 03 00
00 00 08 06 00 01 08 00  06 04 00 02 00 14 bf 7e
3e 3d c0 a8 01 f5 00 16  e3 36 17 eb c0 a8 01 06

ieee80211-decap: skipping extra four bytes
FRDS 00:14:bf:7e:3e:3d->00:16:e3:36:17:eb(00:14:bf:7e:3e:3c) data 54M +62

08 02 2c 00 00 16 e3 36  17 eb 00 14 bf 7e 3e 3c <-- ping reply
00 14 bf 7e 3e 3d 40 3b  84 78 70 00 aa aa 03 00
00 00 08 00 45 00 00 54  33 41 00 00 40 01 c3 1c
c0 a8 01 f5 c0 a8 01 06  00 00 46 69 8b 0b 00 01
89 93 ef 44 be ae 0c 00  08 09 0a 0b 0c 0d 0e 0f
10 11 12 13 14 15 16 17  18 19 1a 1b 1c 1d 1e 1f
20 21 22 23 24 25 26 27  28 29 2a 2b 2c 2d 2e 2f
30 31 32 33 34 35 36 37

ieee80211-decap: skipping extra four bytes

My klutzy "fix" seems to sort of work -- I can now bring up the access point's web page.

My conclusion is that on inbound data frames the AR5006EXS inserts some extra header between the IEE802.11 header and the LLC header. It evidently has something to do with Super A/G.

I stuck in a "printk" to dump out the extra bytes every time I skipped over them.

0: always 0x84
1: takes various values: 00 c4 58 60 ac 16 8e ...
2: takes values between 0x70 and 0x76
3: always 0x00

BTW -- I freely admit that I do not know what I am doing. I'm just coming up to speed on 802.11 and Linux. My recent comms experience is implementing the ControlNet? MAC layer.

Attachments

AR5006EXS.patch (1.8 kB) - added by jrengdahl@hotmail.com on 09/28/06 07:07:06.
patch that makes AR5006EXS work with MadWiFi 0.9.2 svn 1747
AR5006EXS.2.patch (2.8 kB) - added by jrengdahl@hotmail.com on 10/02/06 04:30:10.
hack for AR5006EXS mac version 10.2 bug

Change History

08/26/06 17:07:51 changed by jrengdahl@hotmail.com

I first tested this at work with an AP whch was not connected to any network, and it seemed to go through the right motions with the management web pages inside the AP. When I tried this at home with an online AP (WRT54GS) I got a kernel panic. There is obviously more to this than just deleting the extra four bytes. Also, the packets passed up to Ethereal in monitor mode will be undecipherable.

09/28/06 07:05:31 changed by jrengdahl@hotmail.com

Here is a hack that makes the AR5006EXS "work" on my laptop. I would not consider this a fix, just a hack. Do not apply the patch if you are not experiencing this problem. I do not know why the hack is neccessary or if I've covered all the cases where it is needed. I briefly checked it out with Firefox in STA mode and with Wireshark in MON mode, and they both seem to work. A patchfile against subversion rev 1747 is attached.

09/28/06 07:07:06 changed by jrengdahl@hotmail.com

  • attachment AR5006EXS.patch added.

patch that makes AR5006EXS work with MadWiFi 0.9.2 svn 1747

09/28/06 17:03:00 changed by jrengdahl@hotmail.com

When I fired up the patched system at work today it locked up. Perhaps the patch breaks something that is triggered by the more diverse traffic pattern in the corporate network environment. It worked fine at home last night. On the second try it seems to be staying up. I'll keep an eye on it. Be warned.

On the second try I logged into the console (CTL-ALT-F1 on Fedora) so I could see the kernel panic message if it crashed. I ran my "wifisetup-mon" script from there, then started Wireshark from a Gnome window. So far so good. I'm using it in MON mode.

09/29/06 04:18:43 changed by jrengdahl@hotmail.com

I got a couple new cards today. These are genuine Atheros XB62 reference design cards. The older card that has the problem is an XB62 knockoff by PCDGlobal. Looking at the outside these cards are identical -- they seem to have the same foil and silkscreen. However, the Atheros-built cards work (at home) with the unhacked MadWiFi drivers -- they do not stuff the extra four bytes between the 802.11 and LLC headers.

I am now mystified. I will see if there is some way to identify the chip and/or firmware in these cards.

09/29/06 05:32:44 changed by jrengdahl@hotmail.com

There is a difference in the firmware. The wifi driver reports the revisions of things inside the chip as it powers up. The reports are printed into the kernel log (dmesg). Here is the only difference between the two cards:

old card: wifi0: mac 10.2 phy 6.1 radio 10.2
new card: wifi1: mac 10.3 phy 6.1 radio 10.2

My guess is that there is a bug in the 10.2 MAC firmware that causes this problem.

09/29/06 11:41:40 changed by mrenzmann

Cards that are based on Atheros chipsets don't have any firmware. The version information that you mention refer to the silicon revision of the relevant parts in the chipset.

09/29/06 23:12:05 changed by anonymous

How is this patch appied?

10/02/06 04:30:10 changed by jrengdahl@hotmail.com

  • attachment AR5006EXS.2.patch added.

hack for AR5006EXS mac version 10.2 bug

10/02/06 05:08:16 changed by jrengdahl@hotmail.com

What I did was fetch the latest subversion code, "svn checkout http:<double slash>svn.madwifi.org/trunk madwifi", see madwifi.org/wiki/UserDocs/GettingMadwifi. Fetch the patch file. "cd" to the top level of the source directory and type "patch -p1 <../AR5006EXS.patch". Then build and install according to standard procedure.

I'm sort of a Linux and MadWiFi newbie, so the patch and the way I did things are probably not in canonical form.

Also be warned -- the patch is incomplete. For me it is stable at home, but at work, where there are a lot more different kinds of messages in the air, it kernel panics sooner or later. I don't know why. I may not pursue this any further, since I have the new cards now which do not exhibit the bug.

If one of the madwifi developers wants to pursue this, let me know and I'll mail you the buggy card. You'll need a laptop that has a mini PCI Express slot.

I uploaded a second version of the patch. This one checks to see if the MAC revision is 10.2, and makes sure there is an LLC header, before deleting the four extra bytes. It still panics at work after a while, but is stable at home.

10/02/06 18:52:21 changed by mrenzmann

  • patch_attached set to 1.

12/28/06 19:54:19 changed by m15ch4@gmail.com

Hi! I also have AR5006EXS based card and the same problem. jrengdahl\'s patch works for me just for a while. After some time I get kernel panic (especialy when I switch card into monitor mode). Where is the problem??? Is there a chance that the bug will be fixed?

01/24/07 07:11:22 changed by m15ch4@gmail.com

Hi!

I've spent few days analysing madwifi-ng code trying to find a place where those "4 additional bytes" are inserted into IEEE802.11 incoming frame. I also tried to find out why I get kernel panic after I apply jrengdajl's patch. Here are my conclusions:

1) There are two "main" functions in your patch you use to move the data block (in the skb structure):

memmove(skb->data+4, skb->data, hdrsize); skb_pull(skb, 4);

I am not sure if it is necessary to use memmove function. I think

skb->data=skb_pull(skb, 4);

is all we need. Thanks to this we just change the addres which is pointed by skb->data (upgrading skb->len value at the same time).

2) I discovered that our "four bytes" bug sometimes changes to "six bytes" bug. As we know our bug has place only in ieee802.11 DATA frames. There are four additional bytes between 802.11 header and LLC when "To DS" and "From DS" bits (in FRAME CONTROL filed of 802.11 header) are NOT set or when only one of them is set. When both of them are set to 1 than 802.11 header is longer and there are SIX additional (not needed at this moment) bytes which should be omited.

6 additional bytes case:

 0000  08 0b d5 00 00 0e 2e 9a  42 6d 00 4f 62 08 11 57   ........ Bm.Ob..W
           |
       0000 1011
             /  \
      From DS    To DS

 0010  00 50 8b 51 23 1e e0 a8  00 0e 8e 7d 3f 1e 8e 1e   .P.Q#... ...}?...
                                                 |
                                       End of 802.11 header

 0020  00 58 70 00 aa aa 03 00  00 00 08 00 45 00 00 28   .Xp..... ....E..(
                  |
           Start of REAL LLC

 0030  73 bd 40 00 80 06 5d 27  c0 a8 00 2f d0 41 98 d2   s.@...]' .../.A..
 0040  05 d2 00 50 92 48 41 17  ef a5 83 ae 50 10 ff ff   ...P.HA. ....P...
 0050  39 13 00 00 00 00 00 00  00 00                     9....... ..

As You can see there are 6 additional bytes. I'll try to write some patch and i hope it will work.

01/24/07 07:13:59 changed by mrenzmann

I've reposted the last comment, after fixing the formatting. My Firefox went mad when it tried to display the results of the original comment.

Please: always preview postings before you actually send them, and make sure that it shows up correctly. Thanks (also for the investigation).

02/12/07 15:42:30 changed by fernandohiagon@yahoo.com.br

hi guys,

i have dell m1210 note with mini-pcie atheros AR5006EX. i dont have linux installed here, but looks like this card dont work, right ?

04/02/07 04:45:09 changed by james (at) frymanet (dot) com

I can confirm the information presented here is still evident in the latest trunk release 0.9.4.5 r2250. My card information is as follows. Please note the similar PHY version as mentioned by the ticket owner (mac 10.2 phy 6.1 radio 10.2).

I have also required this card to disable Turbo A/G in Windows to work at all. Is there any way to disable the Turbo modes in hardware on this revision to attempt to resolve issues?

modprobe ath_pci:

[17180286.676000] ath_pci: 0.9.4.5 (svn r2250)
[17180286.676000] ACPI: PCI Interrupt 0000:01:00.0[A] -> Link [LK2E] -> GSI 19 (level, high) -> IRQ 233
[17180286.676000] PCI: Setting latency timer of device 0000:01:00.0 to 64
[17180287.232000] wifi0: 11a rates: 6Mbps 9Mbps 12Mbps 18Mbps 24Mbps 36Mbps 48Mbps 54Mbps
[17180287.232000] wifi0: 11b rates: 1Mbps 2Mbps 5.5Mbps 11Mbps
[17180287.232000] wifi0: 11g rates: 1Mbps 2Mbps 5.5Mbps 11Mbps 6Mbps 9Mbps 12Mbps 18Mbps 24Mbps 36Mbps 48Mbps 54Mbps
[17180287.232000] wifi0: turboA rates: 6Mbps 9Mbps 12Mbps 18Mbps 24Mbps 36Mbps 48Mbps 54Mbps
[17180287.232000] wifi0: turboG rates: 6Mbps 12Mbps 18Mbps 24Mbps 36Mbps 48Mbps 54Mbps
[17180287.232000] wifi0: H/W encryption support: WEP AES AES_CCM TKIP
[17180287.232000] wifi0: mac 10.2 phy 6.1 radio 10.2
[17180287.232000] wifi0: Use hw queue 1 for WME_AC_BE traffic
[17180287.232000] wifi0: Use hw queue 0 for WME_AC_BK traffic
[17180287.232000] wifi0: Use hw queue 2 for WME_AC_VI traffic
[17180287.232000] wifi0: Use hw queue 3 for WME_AC_VO traffic
[17180287.232000] wifi0: Use hw queue 8 for CAB traffic
[17180287.232000] wifi0: Use hw queue 9 for beacons
[17180287.232000] wifi0: Atheros 5424/2424: mem=0xc3000000, irq=233

lspci -vv:

01:00.0 Ethernet controller: Atheros Communications, Inc. Unknown device 001c (rev 01)
        Subsystem: Askey Computer Corp. Unknown device 7112
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B-
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR-
        Latency: 0, Cache Line Size: 64 bytes
        Interrupt: pin A routed to IRQ 233
        Region 0: Memory at c3000000 (64-bit, non-prefetchable) [size=64K]
        Capabilities: [40] Power Management version 2
                Flags: PMEClk- DSI- D1- D2- AuxCurrent=375mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
                Status: D0 PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [50] Message Signalled Interrupts: 64bit- Queue=0/0 Enable-
                Address: 00000000  Data: 0000
        Capabilities: [60] Express Legacy Endpoint IRQ 0
                Device: Supported: MaxPayload 128 bytes, PhantFunc 0, ExtTag-
                Device: Latency L0s <128ns, L1 <2us
                Device: AtnBtn- AtnInd- PwrInd-
                Device: Errors: Correctable- Non-Fatal- Fatal- Unsupported-
                Device: RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop-
                Device: MaxPayload 128 bytes, MaxReadReq 512 bytes
                Link: Supported Speed 2.5Gb/s, Width x1, ASPM L0s L1, Port 0
                Link: Latency L0s <512ns, L1 <64us
                Link: ASPM Disabled RCB 128 bytes CommClk- ExtSynch-
                Link: Speed 2.5Gb/s, Width x1
        Capabilities: [90] MSI-X: Enable- Mask- TabSize=1
                Vector table: BAR=0 offset=00000000
                PBA: BAR=0 offset=00000000
        Capabilities: [100] Advanced Error Reporting
        Capabilities: [140] Virtual Channel

04/06/07 00:07:20 changed by Ron Dippold

We have seen this with AR5413s, mac 10.4. It turned out to be a calibration problem and not tied to a specific mac version or card parts. If you have CTLs for things the the card doesn't support (like compression) you will get an Atheros HAL assertion during calibration and this behavior occurs. Recalibrating the card with these disabled fixes the problem.

What this means for most people is that your supplier (or whoever calibrated the card) messed up. There's nothing you can do but return it.

Furthermore it's not specifically tied to any hardware (other than you're trying to do something the hardware doesn't support) or version numbers and as noted the behavior is not 100% predictable, so including a hack in the mainline would probably be a very bad idea.

04/06/07 19:03:28 changed by mentor

Maybe we should try and detect this, and report it to the user?