Please note: This project is no longer active. The website is kept online for historic purposes only.
If you´re looking for a Linux driver for your Atheros WLAN device, you should continue here .

Ticket #617 (new defect)

Opened 13 years ago

Last modified 12 years ago

Wireless packets disappear probably at the receiver

Reported by: ronen@toroki.com Assigned to:
Priority: major Milestone: version 0.9.x - progressive release candidate phase
Component: madwifi: other Version: trunk
Keywords: lost packet Cc:
Patch is attached: 0 Pending:

Description

I am using a simple point to point setup (single AP and STA) using channel 153 in open system mode. I noticed that when runing UDP traffic (using IPERF) I am occasionally missing packets. The same results had been seen with IXIA. After closely looking into the code and counters I noticed that the transmitter transmit all the packets successfully (i.e. it receives ACK for each packet). However the receiver reports that it receives only part of the transmitted packets. Furthermore, when runing in ff (fast frame) mode the RX packet counter doesn't increment properly.

I had some suspicious that somehow the receiver loosing packets due to short of Rx buffers/descriptors. Indeed, by incrementing the number of TX buffers to 90 (ATH_RXBUF in if_athvar.h) I could send 1000B packets at rate of 35Mbps (transmit rate set to 54M fixed) with very few errors (5E-6). However the problem was not completely solved. I tried to enable RXEOL interrupt, but I did not see it incrementing its couner. Is there a better way to find out where the packets go?

I am using build 1416 from the trunk.

Ronen

Change History

05/12/06 06:21:13 changed by dyqith

r1416 is pretty old, please try something newer from trunk. (You get more feedback/help that way)

thanks.

I never tried to do these type of tests. Make sure its not the physical environment that's affecting your throughput.

Other than that, we can tweak those #define values if you think that'll help in the long run...

05/12/06 21:15:48 changed by ronen@toroki.com

Unfortunately, I can't update into a newer version (I just tried 1546) since it crashes in ath_hal_attach() function (device id=27) I noticed this bug at release about 1453. Here is the output: Using /lib/modules/2.4.27-uc1/net/wlan.o wlan: 0.8.4.2 (svn 1546) Using /lib/modules/2.4.27-uc1/net/ath_hal.o ath_hal: 0.9.16.16 (AR5210, AR5211, AR5212, RF5111, RF5112, RF2413, RF5413, REGOPS_FUNC) Using /lib/modules/2.4.27-uc1/net/ath_rate_sample.o ath_rate_sample: 1.2 (svn 1546) Using /lib/modules/2.4.27-uc1/net/wlan_scan_sta.o Using /lib/modules/2.4.27-uc1/net/ath_pci.o ath_pci: 0.9.4.5 (svn 1546) Enter ath_pci_probe In line 125 In line 173 In line 208 Enter ath_attach for device 27 Calling _ath_hal_attach() Calling ath_hal_attach() Unable to handle kernel paging request at virtual address 4bff4004 pgd = c0c60000 [4bff4004] *pgd=00000000, *pmd = 00000000 Internal error: Oops: f5 CPU: 0 pc : [<c5971268>] lr : [<c5984440>] Not tainted sp : c0c65ddc ip : c0c65dfc fp : c0c65df8 r10: c031c160 r9 : 0000001b r8 : c031c000 r7 : c0c65e68 r6 : 00000001 r5 : c0310000 r4 : c0310000 r3 : 4bff0000 r2 : 00000001 r1 : 00004004 r0 : c0310000 Flags: Nzcv IRQs on FIQs on Mode SVC_32 Segment user Control: 39FF Table: 00C60000 DAC: 00000015 Process insmod (pid: 44, stack limit = 0xc0c64368) Stack: (0xc0c65ddc to 0xc0c66000) 5dc0: 0000001b 5de0: 00000000 c0310000 00000001 c0c65e14 c0c65dfc c59845dc c598442c 0000001b 5e00: c0310000 c0310000 c0c65e4c c0c65e18 c5981108 c59845b0 c0c65e68 c002a420 5e20: c0026e08 0000001b c031c160 00000000 4bff0000 c031c000 0000001b c031c160 5e40: c0c65e60 c0c65e50 c59716f4 c59810d4 c0c65e68 00000007 c0c65e64 c59710a0 5e60: c597162c c0c65e68 4bff0000 c031d8e4 00000000 c031c160 4bff0000 c59a91b8 5e80: c0c65e94 c01adc55 c019d3f8 60000013 00000001 4bff0000 c002a2dc c59ab208 5ea0: c031c000 c02f7000 c031c160 4bff0000 4bff0000 00000007 c59bb2dc 00000007 5ec0: c59b84d0 c031c000 00000000 c002a408 c59bb2dc c59bb350 c02f7000 00000000 5ee0: c3f9bc20 c59bd000 c00d5930 c02f7000 c59bb350 c01a3678 00000000 c00d599c 5f00: ffffffea c59a9000 c019d3fc 001785b8 c59b8750 c002b2f0 c09ad000 c09ad000 5f20: c09af000 00000060 c59a5000 c59a9060 000128ec 00000000 00000000 00000000 5f40: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 5f60: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 5f80: 00000000 00000050 00149d08 001785b8 00000080 c001d684 c0c64000 000128ec 5fa0: 00000000 c001d4c0 00149d08 001785b8 00900080 00149b10 001785b8 00000000 5fc0: 00000050 00149d08 001785b8 c59a9000 00149b10 00000003 000128ec 000781f8 5fe0: bfffdbc4 bfffdbb8 000546bc 40188010 60000010 00900080 00000000 00000000 Backtrace: Function entered at [<c5984420>] from [<c59845dc>]

r6 = 00000001 r5 = C0310000 r4 = 00000000

Function entered at [<c59845a4>] from [<c5981108>]

r6 = C0310000 r5 = C0310000 r4 = 0000001B

Function entered at [<c59810c8>] from [<c59716f4>] Function entered at [<c5971620>] from [<c59710a0>] Backtrace aborted due to bad frame pointer <00000007> Code: 25903014 27930001 2a000009 e5903014 (e7933001)

BTW, I am using IXP425

05/15/06 06:39:12 changed by dyqith

Maybe changeset:1450 (the new hal) broke the IXP425 hal...

Anyone can confirm ?

05/15/06 12:29:49 changed by mrenzmann

The original poster should be able to confirm this assumption, by trying both r1449 and r1450. If r1450 fails while r1449 works for him, the new HAL might be the culprit.

@original poster: if you paste stuff like the kernel oops log, please enclose them with {{{ and }}} to make sure that it's readable.

05/16/06 01:05:26 changed by ronen@toroki.com

Well, as you suspected, 1447 works fine and 1451 fails. A possible hint might be an addition of the compiler flag -march=armv4 in xscale-be-elf.inc. It causes conflict with the flag mcpu=xscale. The crash seems to be at the same location (function). Here is the opps log: Using /lib/modules/2.4.27-uc1/net/wlan.o wlan: 0.8.4.2 (svn 1451) Using /lib/modules/2.4.27-uc1/net/ath_hal.o ath_hal: 0.9.16.16 (AR5210, AR5211, AR5212, RF5111, RF5112, RF2413, RF5413, REGOPS_FUNC) Using /lib/modules/2.4.27-uc1/net/ath_rate_sample.o ath_rate_sample: 1.2 (svn 1451) Using /lib/modules/2.4.27-uc1/net/wlan_scan_sta.o Using /lib/modules/2.4.27-uc1/net/ath_pci.o ath_pci: 0.9.4.5 (svn 1451) Unable to handle kernel paging request at virtual address 4bff4004 pgd = c0c60000 [4bff4004] *pgd=00000000, *pmd = 00000000 Internal error: Oops: f5 CPU: 0 pc : [<c597022c>] lr : [<c5983404>] Not tainted sp : c0c67dec ip : c0c67e0c fp : c0c67e08 r10: c031c160 r9 : 0000001b r8 : c031c000 r7 : c0c67e78 r6 : 00000001 r5 : c0310000 r4 : c0310000 r3 : 4bff0000 r2 : 00000001 r1 : 00004004 r0 : c0310000 Flags: Nzcv IRQs on FIQs on Mode SVC_32 Segment user Control: 39FF Table: 00C60000 DAC: 00000015 Process insmod (pid: 44, stack limit = 0xc0c66368) Stack: (0xc0c67dec to 0xc0c68000) 7de0: 0000001b 00000000 c0310000 00000001 c0c67e24 7e00: c0c67e0c c59835a0 c59833f0 c031d938 c0310000 c0310000 c0c67e5c c0c67e28 7e20: c59800cc c5983574 c0c67e78 60000013 c001e0b8 c031d938 00000000 c031c160 7e40: 4bff0000 c031c000 0000001b c031c160 c0c67e70 c0c67e60 c59706b8 c5980098 7e60: c0c67e78 00000007 c0c67e74 c597007c c59705f0 c0c67e78 c59b772c c59a8198 7e80: c0c67e94 c0c67ea6 c02f7000 c02f7000 00000001 0000001b c3f9baa0 c59aa208 7ea0: c031c000 c02f7000 c031c160 4bff0000 4bff0000 00000007 c59ba194 00000007 7ec0: c59b7434 c031c000 00000000 c002a408 c59ba194 c59ba208 c02f7000 00000000 7ee0: c3f9bc00 c59bc000 c00d5930 c02f7000 c59ba208 c01a3678 00000000 c00d599c 7f00: ffffffea c59a8000 c019d3fc 00178198 c59b76a8 c002b2f0 c09b1000 c09b1000 7f20: c09b3000 00000060 c59a4000 c59a8060 00012784 00000000 00000000 00000000 7f40: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 7f60: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 7f80: 00000000 00000050 00149d08 00178198 00000080 c001d684 c0c66000 00012784 7fa0: 00000000 c001d4c0 00149d08 00178198 00900080 00149b10 00178198 00000000 7fc0: 00000050 00149d08 00178198 c59a8000 00149b10 00000003 00012784 000781f8 7fe0: bfffdbc4 bfffdbb8 000546bc 40188010 60000010 00900080 00000000 00000000 Backtrace: Function entered at [<c59833e4>] from [<c59835a0>]

r6 = 00000001 r5 = C0310000 r4 = 00000000

Function entered at [<c5983568>] from [<c59800cc>]

r6 = C0310000 r5 = C0310000 r4 = C031D938

Function entered at [<c598008c>] from [<c59706b8>] Function entered at [<c59705e4>] from [<c597007c>] Backtrace aborted due to bad frame pointer <00000007> Code: 25903014 27930001 2a000009 e5903014 (e7933001)

I guess I should wait for a new compilation of the HAL

06/08/06 19:20:45 changed by thierry.langlais@efixo.com

Hi all, Like all of you, i got the same problem. I tried r1449 and the most recent r1634 revision but i can only load the wlan.o. All others fails like above. I'm using a IXP465. Could someone notice me which is the last usable version on xscale processor ? Thanks.

Thierry L.

07/07/06 01:50:13 changed by ronen@toroki.com

I upgraded the kernel to 2.6 (2.6.15) and was able to use the latest madwifi release (build 1668). So it should probably be noted that the new HAL would only work with 2.6 kernel under xscale architecture. Now, it is time to try and reproduce the incident that this ticket was originally opened for.

07/11/06 20:43:26 changed by ronen@toroki.com

Check performance with release 0.91 (build 1650). Still, there is a drop of about 8 packets per million. The Tx counter of ath0 interface increment by 1 million but the Rx counters of ath0 at receiver only increment by 999992. Any idea where to start looking?

07/20/06 21:51:57 changed by ronen@toroki.com

I believe that I found the problem although I am not exactly wure why I see this statistics. Anyway, the macro that checks that the sequence number is not duplicate defines in ieee80211.h as follows:

#define IEEE80211_SEQ_LEQ(a,b) ((int)((a)-(b)) <= 0)

and is called from ieee80211_input.c as follows:

if ((wh->i_fc[1] & IEEE80211_FC1_RETRY) && IEEE80211_SEQ_LEQ(rxseq, ni->ni_rxseqs[tid]))

This will not work well when the sequence number wraps around (from 4095 to 0).

I believe that the macro definition should be:

#define IEEE80211_SEQ_LEQ(a,b) ((int)((a)-(b)) == 0)

Indeed, trying this eliminates the dropped packets. I still do not understand why it happeneds so infrequently (about 8 packets per million).

07/21/06 06:58:33 changed by mrenzmann

  • status changed from new to assigned.
  • owner set to mrenzmann.
  • milestone set to version 0.9.3.

Thanks for your investigation and your feedback. This issue (dropped packets due to error in mentioned macro) as well as increasing buffers should get fixed in release 0.9.3. Repository is currently frozen for the upcoming release of 0.9.2 and these changes are IMO not important enough to justify a freeze breakage.

Although 0.9.2 is not out yet, I already added the 0.9.3 milestone in Trac, so that I won't forget about it when I come back from vacations :)

07/21/06 19:57:01 changed by ronen@toroki.com

I failed to mention that there is no need to increase the incomming received buffers (ATH_RXBUF).

09/10/06 14:17:33 changed by anonymous

Regarding the insmod oops on 2.4/xscale... I still see it. Anybody got a clue how to fix this ?

12/08/06 15:35:30 changed by mrenzmann

  • status changed from assigned to new.
  • milestone changed from version 0.9.3 to version 0.9.x - progressive release candidate phase.

@ronen: I'm not sure if the suggested fix is actually the right way to do it. The idea of this test is to recognize duplicate frames. The test as such seems to be ok at this place, and I just verified that FreeBSD as well as NetBSD do the same in their versions of net80211.

I expect that the source of the problem is to be sought elsewhere. This issue needs more investigation, so I reschedule the ticket. Anyone up for the investigation?

12/08/06 15:35:58 changed by mrenzmann

  • owner deleted.

08/09/07 12:32:32 changed by anonymous

I do it,but the error don't disappear.the error is : Unable to handle kernel NULL pointer dereference at virtual address 00000000 pgd = c1d98000 [00000000] *pgd=01dce801, *pmd = 01dce801, *pte = 00000000, *ppte = 00000000 Internal error: Oops: 7 CPU: 0 pc : [<c397651c>] lr : [<c3995204>] Not tainted sp : c18c7e60 ip : 000130c4 fp : c18c7ea8 r10: c16ea160 r9 : 00000013 r8 : c16ea000 r7 : c16a0000 r6 : c16ea160 r5 : 00000000 r4 : 00000000 r3 : 00000000 r2 : 00000000 r1 : 00000000 r0 : c16a0000 Flags: nZCv IRQs on FIQs on Mode SVC_32 Segment user Control: 39FF Table: 01D98000 DAC: 00000015 Process busybox (pid: 89, stack limit = 0xc18c6368) Stack: (0xc18c7e60 to 0xc18c8000) 7e60: 00000000 00000000 00000000 00000000 00000000 c39a4978 c39a6c30 00000008 7e80: c16ea000 c1fe1000 c16ea160 4bff0000 4bff0000 00000007 c39a6fc4 c18c7edc 7ea0: c18c7eac c39a464c c399506c c16ea000 00000000 c3934008 c39a6fc4 c39a7134 7ec0: c1fe1000 00000000 c188fa60 c3934000 c18c7f14 c18c7ee0 c00be3ac c39a4420 7ee0: c1fe1000 c39a7134 c0163994 00000000 c00be418 ffffffea c3995000 c015f400 7f00: 00095318 c39a48e0 00000007 c18c7f18 c005234c c39a48b8 c16eb000 c16eb000 7f20: c16ed000 00000060 c398d000 c3995060 0001276c 00000000 00000000 00000000 7f40: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 7f60: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 7f80: 00000000 0000000e 00095318 c3995000 00000080 c0044684 c18c6000 0007db30 7fa0: 00000000 c00444c0 0000000e c0044328 000686d8 00095318 bfffcdcc c39a72cc 7fc0: 0000000e 00095318 c3995000 00068858 c39a7160 c39a5218 0007db30 bffffdcc 7fe0: 4007f71c bfffcad0 000299dc 4007f728 20000010 000686d8 e3a03016 e5803000 Backtrace: Function entered at [<c3995060>] from [<c39a464c>] Function entered at [<c39a4414>] from [<c00be3ac>] Function entered at [<c39a48ac>] from [<c005234c>] Backtrace aborted due to bad frame pointer <00000007> Code: e3a00000 e1a0f00e 00001ffe e59fc01c (e5912000) Segmentation fault

08/09/07 13:13:39 changed by mentor

  • summary changed from Wireless packets disapear probably at the receiver to Wireless packets disappear probably at the receiver.