Please note: This project is no longer active. The website is kept online for historic purposes only.
If you´re looking for a Linux driver for your Atheros WLAN device, you should continue here .

Ticket #1154 (new defect)

Opened 15 years ago

Last modified 14 years ago

Ad-Hoc mode queueing / latency problems

Reported by: georg@boerde.de Assigned to:
Priority: minor Milestone:
Component: madwifi: other Version: trunk
Keywords: adhoc latency ping Cc:
Patch is attached: 0 Pending:

Description

From time to time, I see strange latency effects on my ad-hoc network. The ICMP ping time is sequentially growing from the normal 0.5ms up to 70ms and then resets to normal:

node_a:~# ping node_b
PING node_b (192.168.5.1) 56(84) bytes of data.
64 bytes from 192.168.5.1: icmp_seq=1 ttl=64 time=1.04 ms
64 bytes from 192.168.5.1: icmp_seq=2 ttl=64 time=3.75 ms
64 bytes from 192.168.5.1: icmp_seq=3 ttl=64 time=19.0 ms
64 bytes from 192.168.5.1: icmp_seq=4 ttl=64 time=37.8 ms
64 bytes from 192.168.5.1: icmp_seq=5 ttl=64 time=58.9 ms
64 bytes from 192.168.5.1: icmp_seq=6 ttl=64 time=0.526 ms
64 bytes from 192.168.5.1: icmp_seq=7 ttl=64 time=0.532 ms
64 bytes from 192.168.5.1: icmp_seq=8 ttl=64 time=20.7 ms
64 bytes from 192.168.5.1: icmp_seq=9 ttl=64 time=44.2 ms
64 bytes from 192.168.5.1: icmp_seq=10 ttl=64 time=67.3 ms
64 bytes from 192.168.5.1: icmp_seq=11 ttl=64 time=0.532 ms
64 bytes from 192.168.5.1: icmp_seq=12 ttl=64 time=7.37 ms
64 bytes from 192.168.5.1: icmp_seq=13 ttl=64 time=28.3 ms
64 bytes from 192.168.5.1: icmp_seq=14 ttl=64 time=49.5 ms
64 bytes from 192.168.5.1: icmp_seq=15 ttl=64 time=68.8 ms
64 bytes from 192.168.5.1: icmp_seq=16 ttl=64 time=0.525 ms
--- node_b ping statistics ---
16 packets transmitted, 16 received, 0% packet loss, time 15029ms
rtt min/avg/max/mdev = 0.525/25.565/68.832/24.736 ms

I have disabled anything I've though might be responsible for it, but it does not help:

iwpriv ath0 uapsd 0
iwpriv ath0 bgscan 0
iwconfig ath0 power off

I've looked with tcpdump (monitor mode) on a third node:

  • When pinging from node_a to node_b, the delay can be seen on the medium in monitor mode (which means, node_b is delaying the processing).
  • When pinging from node_b to node_a, the packets are not delayed on the medium, but node_b still shows the strange "ladder effect" of increasing latency (which means, node_b is delaying again).

I can't tell if the delay happens before the packet is sent out or when it is received, because I don't have good timestamps of what is happening on the node.

OTOH, when I create a monitor mode VAP on node_b, the effect vanishes, and reappears after some minutes/hours. The same happens when the monitor VAP is destroyed. Sometimes the problem just disappears too.

Attachments

ramping_ping_times.png (14.2 kB) - added by derek@indranet.co.nz on 03/22/07 23:52:59.
ping times measured and plotted over a 3 minute time period.
fix_ramping_ping_times.patch (447 bytes) - added by derek@indranet.co.nz on 04/20/07 00:35:47.
Fixes ramping ping times in adhoc mode, by disabling powersave mode

Change History

(follow-up: ↓ 3 ) 02/16/07 16:38:44 changed by dries.naudts@intec.ugent.be

Does this occur when disabling Super A/G? Try compiling without ATH_SUPERG_FF. Maybe this has to do with bursting?

Cheers, Dries

03/22/07 23:52:59 changed by derek@indranet.co.nz

  • attachment ramping_ping_times.png added.

ping times measured and plotted over a 3 minute time period.

03/23/07 00:11:35 changed by derek@indranet.co.nz

I have seen the same thing here on two different adhoc networks.

I have plotted the ping times with gnuplot.

The data was generated by doing the following on node A

ping B &> log_file

leave it for 4 minutes or so.

grep from log_file | cut -d \= -f4 | cut -d m -f1 > file

run gnuplot

gnuplot>plot "file" with linespoints

(to get the png file, with gnuplot you do)

gnuplot> set terminal png size 1200,800

gnuplot> set output "ramping_ping_times.png"

gnuplot> plot "file" with linespoints

When the ping time is ramping, the throughput is abysmal. Setting the beacon interval to higher or lower values does alter the peak value observed in the graph. So, a beacon interval of 500 means that the peak time in the graph is around 500ms. A beacon interval of 25 (which is sufficiently low to adversely affect network capacity) will lead to peak "latency" of around 30.

Settings: fixed rate (5.5m) distance to 5000m (athctrl -d 5000) bgscan turned off, uapsd turned off, bssid fixed with iwconfig ath0 ap 02:04:08:10:20:40

This ramping ping time effect takes several hours of operation before it appears. If the user does a iwconfig/iwscan/etc command, it delays the onset of ramping pings. Sometimes, the network had to be left for days. Onset of ramping appears to be faster if there are more nodes in the network. With 2 nodes, days are required. with 3 nodes, hours or day.. With 4 nodes, several hours.

it is a serious issue - the throughput is abysmal (about 10-20%) of the value meansured when the network is first started. Throughput is measured by doing scp of a large file, and recording the transfer time.

(in reply to: ↑ 1 ) 03/27/07 05:03:44 changed by anonymous

Replying to dries.naudts@intec.ugent.be:

Does this occur when disabling Super A/G? Try compiling without ATH_SUPERG_FF. Maybe this has to do with bursting? Cheers, DrieS

With ATH_SUPERG_FF disabled, ramping ping times still happen.

(did a make clean, adjusted BuildCaps?.inc to remove ATH_SUPERG_FF, build, examined the .files.o.flags to check the compile options had ATH_SUPERG_FF removed)

the measured throughput (as measured by scp /bin/bash remote_node: ) was 20% of that achieved when there is no ramping of ping times.

04/20/07 00:35:47 changed by derek@indranet.co.nz

  • attachment fix_ramping_ping_times.patch added.

Fixes ramping ping times in adhoc mode, by disabling powersave mode

04/20/07 00:41:44 changed by derek@indranet.co.nz

The ramping ping times occur because some/all of the nodes in the adhoc network have gone into powersave mode. You can see with an ethereal capture of the radio traffic that the destination node is buffering the incoming packets, until a beacon is generated. Once the destination node has generated the beacon packet, you see a stream of ICMP reply packets...

We found that disabling powersave (in adhoc mode) fixed this problem.

Signed-off-by: Derek J Smithies <derek <at> indranet <dot> co <dot> nz> This work has been sponsored by Indranet Technologies Ltd

Patch is attached

07/06/07 14:58:37 changed by georg@boerde.de

There is an ioctl() to change the powersaving mode. It would be better to change the default value for ad-hoc mode to "no powersaving", but still to allow the user to change the value at runtime.

Btw, it looks like the power saving/management is not correctly reported to iwconfig, I can only see "Power Management:off"

09/28/07 20:49:21 changed by mentor

This patch patches the symptoms not the problems. The code referenced should not be running in IBSS mode anyway (see net80211/ieee80211_scan.c:scan_restart()). I don't know what the actual problem is, but this is not the fix.

10/18/07 23:40:42 changed by derek@indranet.co.nz

Turns out the ping times behave this way because it is a strobing of the beacon interval (102.4ms) and the ping packets (every 1000ms). Between two consecutive ping packts, the is a 24.0 ms difference in the position in the beacon interval. From the ping times reported in the above log of ping between two nodes. you see an average of 24ms difference. There is some variability, but over the various runs we have done, it is 24 ms each time.

From an adhoc network of two nodes which do ping between them, and a third node to listen (via tcpdump & ethereal), data on what ramping means at the packet level was obtained. A flood ping was run between the two nodes. The node (which originates the ping) does not send packets uniformly throughout the interval between beacons. Thus, if a beacon is sent out at 0ms and 102.4ms, ther will be icmp packets sent out between 0 and 40ms, no packets between 40 and 80ms, and then packets between 80 and 102.4ms.

So how do you repeat this? Easily.

1. Take two nodes (three is more reliable),

2. set the beacon interval of all nodes to 1000

3. bring the adhoc interface on all nodes up at "exactly" the same time. exactly - within less than a second of each other. If the interfaces come up with a 10 second gap - you won't get ramping. I found ramping occurs most reliably with the smallest possible gap.

4. run a broadcast ping from one node, and observe the response time of the different packets. With a beacon interval of 1000, you will see the response time linearly increase to 900 or so, and the increase (between subsequent pings) will be 24ms.

This queing/latency issue is also repeatable by taking lots of nodes and putting them into a network and waiting some days. Our perception is that the nodes on the edge of the network (with the weakest links) are first to illustrate latency. This can take days before latency is observed.

03/19/08 12:06:24 changed by m-h lu

I faced the same ramping ping problem in madwifi 0.9.3.1. However, the patch does not work to me. I put some debug messages in the related files, and I found the patched code was not even been executed. Did I miss something here?

-M

04/12/08 15:15:04 changed by sah

I have the same problem on a madwifi based AP, with any madwifi version. Tried with older and also with 0.9.4 version, on different kernels, 2.6.17 -> to 2.6.23 with realtime patch. I was not able to solve this issue until now. I measured the delay of the packets in ath_tx_start and the tx_processq functions , and the ramping is still there, for any packets, not only pings. I suspect it is either because of the atheros chipset, or due to something in madwifi which I am not aware of. I tried with a pcmcia dlink card, a pcmcia 3com card, ant last with a pci dlink card. All are Atheros based, and the irq, was set to 10 and no other process hogged the system.

Silviu