Please note: This project is no longer active. The website is kept online for historic purposes only.
If you´re looking for a Linux driver for your Atheros WLAN device, you should continue here .

Ticket #275 (reopened defect)

Opened 14 years ago

Last modified 12 years ago

Scan for non-ESSID-broadcasting access point always fails

Reported by: dplatt@radagast.org Assigned to: mrenzmann
Priority: major Milestone: version 0.9.2
Component: madwifi: 802.11 stack Version: trunk
Keywords: Cc: scottraynel@gmail.com
Patch is attached: 1 Pending:

Description

madwifi-ng works sporadically, or not at all, with wpa_supplicant when one attempts to use a network whose access points are "stealthed" (not broadcasting beacons with ESSID).

There appear to be two closely-related reasons for this:

[1] The ieee80211_ioctl_siwscan handler doesn't process the ioctl parameters, and will thus ignore the IW_SCAN_THIS_ESSID flag and the specified ESSID. The VAP's default ESSID is the one actually scanned for. This problem affects the wpa_supplicant "wext" driver, which passes the scan ESSID via the SIWSCAN ioctl data. It doesn't affect the wpa_supplicant "madwifi" driver, which changes the VAP's default ESSID before scanning.

[2] It's often the case in practice that ieee80211_ioctl_siwscan will be called, to scan for a desired ESSID, when there's already an active scan in progress. When this happens, ieee80211_ioctl_siwscan has no effect... it calls ieee80211_start_scan which simply logs an "active scan already in progress" and returns. No scan for the newly-desired ESSID is performed. This problem tends to prevent wpa_supplicant from working with either the wext or madwifi drivers - there always seems to be an active scan running (usually with no ESSID specified) when wpa-supplicant calls for a new active scan, and the new scan(s) lose out.

I've worked up a patch which resolves both of these issues, and seems to allow wpa_supplicant to find and associate with stealthed APs quickly and reliably. It adds handling of the IW_SCAN_THIS_ESSID and its data (enabling use of the wext driver), and it adds logic to cancel any existing active scan and wait for a few milliseconds to allow the cancellation to take effect before starting the new scan. The latter isn't the most elegant solution in the world, but changing the driver to support multiple simultaneous scans is more than I'm about to try to tackle (if it's even possible).

The patch is attached.

Signed-off-by: Dave Platt <dplatt@radagast.org>

Attachments

active-scan.patch (1.7 kB) - added by dplatt@radagast.org on 01/02/06 02:03:42.
active-scan-20060305.patch (2.7 kB) - added by anonymous on 03/06/06 05:24:34.
active-scan-20060514.patch (2.9 kB) - added by kelmo on 05/14/06 14:33:31.
rediffed and with mdelay(1) increments
backout-preempt-scan.diff (3.1 kB) - added by mrenzmann on 07/27/07 10:59:30.
Proposed patch to back out preempt_scan from current trunk

Change History

01/02/06 02:03:42 changed by dplatt@radagast.org

  • attachment active-scan.patch added.

01/03/06 12:52:21 changed by mrenzmann

  • priority changed from major to minor.
  • status changed from new to assigned.
  • version set to trunk.
  • owner set to mrenzmann.
  • milestone set to version 1.0.0 - first stable release.

Thanks for the patch.

The description of the "adds logic to cancel any existing active scan and wait for a few milliseconds to allow the cancellation to take effect before starting the new scan" part sounds familiar. Either a similar patch already has been applied before (probably touching a different part of the driver) or it's in the review queue...

I'd like to ask others to give this patch a try and see how it performs for them. I'll try to review it when time permits, hopefully that isn't too far away. Meanwhile, comments are welcome.

01/06/06 13:39:34 changed by tjjalava@gmail.com

Patch works fine for me. I'm using madwifi driver and wpa_supplicant. Before applying I couldn't connect to my AP when ESSID broadcast was disabled. Now it connects fine and fast. Thanks a lot for the patch.

01/18/06 09:04:05 changed by svens

This patch looks ok for me. Only thing i don't like is that mdelay(10).

01/18/06 10:36:00 changed by kelmo

This patch reduces the scanning time when using wpa_supplicant to acceptable times, which is great!

01/22/06 10:37:06 changed by mrenzmann

@reporter: I agree with svens, the msleep(10); isn't very nice. We shouldn't guess if a scan has been aborted in a given time, but be sure about that. Do you have a proposal how this could be achieved?

01/22/06 21:28:46 changed by imr1@waikato.ac.nz

If you have a look at the imr-setmode-delay-patch.diff from ticket #228 I've done a very similar thing. There is a flag you can loop on to check if the scan has finished yet, but you'll need to have something in the loop or it's infinite and the scan never finishes. Specifically I did this between the cancelling and restarting of the scan.

while((ic->ic_flags & IEEE80211_F_SCAN) != 0) mdelay(1);

01/24/06 02:13:18 changed by dplatt@radagast.org

Yes, I agree - the simple, single mdelay() call is inelegant and isn't guaranteed to work in all cases.

Unfortunately I haven't been able to convince myself that there is any method which can (even in principle) work "correctly" in all cases. Every method seems to have shortcomings.

Fundamentally, there seems to be a basic resource-crunch here. There could be multiple threads of code (different wireless tools - e.g. a daemon and a GUI panel) which wish to initiate active scans for various purposes. If I understand things firmly (and perhaps I do not) the underlying firmware is capable of handling only one active scan at a time. This creates a conflict: if two or more entities try to scan at once, then one of them is going to lose, in one way or another. Any of several things can happen - its scan is never started (as is currently the case in the main-trunk code), or it's delayed in its ability to start a scan until the previous scanner is done, or its scan is started but then canceled "behind its back" without warning or notice. Somebody loses; I'd guess that the goal is to be reasonably fair in how this happens, and ensure that no code of thread becomes "stuck" indefinitely.

The simple one-time mdelay() call makes an attempt to shut down a previous scan semi-gracefully before starting its own. On the bad side, this isn't guaranteed to work: the previous scan might take longer to shut down, and another party might come in after the cancellation takes effect and start another scan ("jumping the queue" in one way or another). On the good side, this approach wouldn't seem to be capable of causing the calling thread to hang indefinitely.

The method used in ticket #228 is another way of doing it. On the good side, it'll proceed more quickly after the scan cancellation takes effect, and it's more positive about making sure that the cancellation did take effect. On the bad side, it looks to me as if it could hang the calling thread for quite a while - if another thread "jumps the queue" and starts another scan after the cancellation takes effect and before this thread wakes up and checks, then the queue-jumper's scan wins out and the original canceller has to wait an indefinite amount of time.

A safer compromise would be to use the method in #228, but with an iteration count and a timeout after perhaps 50 - 100 milliseconds. If the interface isn't out of active-scan mode by then, the code could either re-cancel and wait again ("shooting the claim-jumper") or just bail out gracefully. Either is probably better than being stuck indefinitely.

A fancier approach would be to maintain some sort of explicit queue of active scans which had been requested, but not yet actually initiated. Some piece of code (perhaps a separate kernel thread which managed the interface, or perhaps the driver bottom-half) would terminate one scan and start the next, as appropriate. This would be a much more complex and invasive change to the driver. It's beyond what I'd want to tackle myself at this stage, and frankly I'm not sure if it's really worth the effort. I would hope that the higher-level software which is asking for scans (e.g. wpa_supplicant, GUIs, etc.) would simply treat the results of a scan conflict the way that they'd treat any other scan which didn't find the desired APs - they'd idle for a while and then re-scan.

So... I'm quite willing to redo my patch, replacing the simple mdelay() call with an interation-limited "check flag, sleep if it's still scanning" loop and a graceful bailout after a reasonable time (100 ms?). Would that be satisfactory to all concerned? If so, perhaps it would be wise to have the #228 patch use the same technique?

Is there a call other than mdelay() which would be preferable?

02/01/06 06:42:37 changed by kelmo

  • patch_attached set to 1.

02/01/06 23:43:27 changed by scottraynel@gmail.com

  • cc set to scottraynel@gmail.com.

Hi,

Please see my comment (and patch) attached to ticket #228.

Cheers,

Scott.

02/02/06 08:42:19 changed by kelmo

One user (Manfred on madwifi-users) reported that madwifi-ng will fail to compile on linux 2.6.11 when this patch is applied:-

make[2]: Entering directory `/usr/src/linux-2.6.11-kanotix-11'
  CC [M]  /usr/src/modules/madwifi-ng/net80211/if_media.o
  CC [M]  /usr/src/modules/madwifi-ng/net80211/ieee80211.o
  CC [M]  /usr/src/modules/madwifi-ng/net80211/ieee80211_beacon.o
  CC [M]  /usr/src/modules/madwifi-ng/net80211/ieee80211_crypto.o
  CC [M]  /usr/src/modules/madwifi-ng/net80211/ieee80211_crypto_none.o
  CC [M]  /usr/src/modules/madwifi-ng/net80211/ieee80211_input.o
  CC [M]  /usr/src/modules/madwifi-ng/net80211/ieee80211_node.o
  CC [M]  /usr/src/modules/madwifi-ng/net80211/ieee80211_output.o
  CC [M]  /usr/src/modules/madwifi-ng/net80211/ieee80211_power.o
  CC [M]  /usr/src/modules/madwifi-ng/net80211/ieee80211_proto.o
  CC [M]  /usr/src/modules/madwifi-ng/net80211/ieee80211_scan.o
  CC [M]  /usr/src/modules/madwifi-ng/net80211/ieee80211_wireless.o
/usr/src/modules/madwifi-ng/net80211/ieee80211_wireless.c: In function 
`ieee80211_ioctl_siwscan':
/usr/src/modules/madwifi-ng/net80211/ieee80211_wireless.c:1296: error: storage 
size of `req' isn't known
/usr/src/modules/madwifi-ng/net80211/ieee80211_wireless.c:1296: warning: 
unused variable `req'
make[3]: *** [/usr/src/modules/madwifi-ng/net80211/ieee80211_wireless.o] Error 
1
make[2]: *** [_module_/usr/src/modules/madwifi-ng/net80211] Error 2
make[2]: Leaving directory `/usr/src/linux-2.6.11-kanotix-11'
make[1]: *** [all] Error 2
make[1]: Leaving directory `/usr/src/modules/madwifi-ng/net80211'
make: *** [modules] Fehler 1

Does it require a recent WE version to function? or am I on the wrong track?

Thanks, Kel.

02/02/06 19:57:57 changed by dplatt@radagast.org

Hmmm. I'd been building it against 2.6.14, and it seemed to work OK in that kernel. I'll check to see if there were changed in the wireless extensions (either in detail or in structure) between 2.6.10 and 2.6.14 which would have caused this.

My guess is that the ioctl request structure may have been moved to a different #include file or is named differently. I'll see what I can find.

Oh - I reviewed scottraynel's updated patch for #228, and I think that his new patch's timeout logic looks fine. I agree with his conclusion that the current net80211 code structure is somewhat prone to race conditions. Unfortunately I suspect that truly fixing this would require a rather major reorganization of the code (e.g. moving to a carefully-designed set of state machines, with callers submitting requests to some sort of queue and having a central state machine manage all of the state transitions). Not an easy job.

I'll copy/lift/abstract/mimic/steal scott's timeout code and incorporate it into the patch I submitted a few weeks ago, and resubmit it. I'll also see if I can tweak it to work correctly with older versions of the kernel / wireless extensions, or at least make it harmless when applied to older kernels.

02/20/06 06:29:33 changed by mrenzmann

Did you already find the time to update your patch, David?

02/20/06 06:34:33 changed by dplatt@radagast.org

Wow. Synchronicity strikes again. Not half an hour ago, I sat down at my laptop, said "Gee, I've been putting this off too long, I really owe those good folks an updated patch", did an svn update, and started merging my original patch and Scott's improved timeout code into the current code.

It's compiling now. I'll see if I can figure out why it doesn't compile right against some older versions of the wireless extensions, fix that if possible, test, and resubmit. With luck it'll happen tonight... if not, then in a few days.

Sorry about the delay!

02/20/06 07:48:43 changed by dplatt@radagast.org

Well, as Bullwinkle says, "Oh fiddle-faddle!"

I merged my change, and Scott's improved timeout code, into the current subversion code. I also figured out the reason it didn't compile in 2.6.11 - the test needs to be "#if WIRELESS_VERSION > 17" rather than "#ifdef IW_SCAN_THIS_ESSID".

It now compiles OK. The new scan-cancellation logic seems to work properly - it seems to take about 3 milliseconds for the scan cancellation to take effect. The new active scan starts up OK and my AP's ESSID shows up in the scan cache.

And, the silly thing doesn't work. As far as I can tell, it's for reasons unrelated to my patch. The log indicates that an association failure occurs when the card is told to associate with my AP - "Reason 1".

I dropped back to my earlier code base (which uses a subversion code base from early January) and the card associates with my AP just fine.

I tried tweaking my patch, to sleep in 10-millisecond increments rather than 1-millisecond increments, to try to re-create the timing behavior of my earlier version. It still fails to associate.

I'm not sure why this is happening. Possibly something in the new code base is broken, or requires a change to the scripts and settings I use to initialize the card and bring it up.

I'll fiddle with it some more, take a look at the other tickets and the wiki, and see if I can figure out why it's no longer associating properly.

02/20/06 09:07:14 changed by kelmo

Well, I am still carrying along your old patch to current svn, and it is still working ok (apart from mentioned compilation problem). Combining your patch with Scotts from #228 resulted in association problems. (I have no idea how that code interacts)

03/06/06 05:23:37 changed by dplatt@radagast.org

Well, I'm still puzzled, although the mystery is a bit different now.

In looking into the association-failure problem, I inferred from the logs that an authorization/association request to an AP is something akin to a scan, and that cancelling a scan when the VAP is trying to authenticate with its chosen AP might be responsible for the association failure.

I rewrote my cancel-active-scan code, in the form of a subroutine which can perform a "graceful" cancellation of an existing active scan. It accepts two parameters - a "grace time" during which it'll wait for an existing active scan to terminate before the scan is cancelled, and a "timeout" which limits the amount of time it'll wait for the cancellation to take effect. In the start-scan ioctl, I call this with 100 msec for both timeouts.

This change, applied against this afternoon's subversion code, appears to fix the inability to associate. The VAP associates with the AP just fine...

... and the disassociates a few seconds later and starts scanning again. Lather, rinse, repeat - it won't stay associated.

The dissociation appears to be triggered by the tx_timeout routine. Apparently, the driver tried to send a management frame at some point, and this transmit never took place or its status was lost or the timeout wasn't properly cancelled. The tx_timeout routine resets the VAP status back to SCANNING, and the whole cycle repeats.

I haven't been able to figure out which management frame is timing out, or why. I don't know whether the loss of this frame is caused by the cancellation of an active scan, or whether it's due to some other change in the driver recently and is unrelated to my change.

I'll attach the current version of my patch. If we can figure out the cause of the problem, then possibly the "graceful cancel" routine could be used by #228 as well.

03/06/06 05:24:34 changed by anonymous

  • attachment active-scan-20060305.patch added.

05/14/06 14:33:31 changed by kelmo

  • attachment active-scan-20060514.patch added.

rediffed and with mdelay(1) increments

05/14/06 14:35:28 changed by kelmo

Hi, after a long time with no action on this ticket, I have used the patch with mdelay(1) increments as well as the patch from #228. My station seems to be associating fine, and staying associated. So far so good!

05/15/06 01:40:20 changed by dyqith

related comment: ticket:572

05/25/06 22:24:09 changed by anonymous

I recently entered ticket #646 that deals with wpa_suppl and scanning results. This ticket was referenced in one of the comments...

One question I have is since scanning is asynchronous, why the worry about cancelling a pending scan? Why not return OK? A SCAN_RESULTS event should get fired and as I understand it will be propagated to any threads listening for wireless events. I understand that if the 2nd scan and/or subsequent scans are for a specific SSID, then incorrect results could be returned. This could be handled by returning an error in the case where a scan is pending and a new scan request is made with a different SSID. If the SSIDs of subsequent scans are the same as the pending scan's SSID, then just return OK.

Typically, a system will have one process in charge of scanning and association ( eg. wpa_suppl, manual config via iwconfig, etc... ). wpa_suppl does a pretty good job of managing it's own scan behavior. I would argue that if two processes are running on a system and interferring with each other's WiFi scanning, then the system in question has been setup incorrectly.

Comments?

05/25/06 22:36:38 changed by espy@pepper.com

Oops... that last comment was mine. I forgot to enter my email.

06/12/06 22:43:34 changed by sangio.f@tiscali.it

Hi all, I'm using madwifi 0.9.0 with Linux 2.6.16.20 and wpa_supplicant 0.4.9 and I can confirm the issue. I usually run wpa_supplicant with the following command:

wpa_supplicant -Dmadwifi -iath0 -c/etc/wpa_supplicant.conf -dd

I've tried with every possible combination of related settings in wpa_supplicant.conf:

1) ap_scan=1, scan_ssid=0 2) ap_scan=1, scan_ssid=1 3) ap_scan=2, scan_ssid=0 4) ap_scan=2, scan_ssid=1

but I never managed to make the NIC associate to the AP with hidden ssid.

It worked just in 3 cases:

1) iwconfig ath0 essid <my_essid> BEFORE launching wpa_supplicant 2) launching wpa_supplicant with "-Dwext" option instead of "-Dmadwifi" (it associates *very* fast with ap_scan=2, wireless extensions' ioctls seem to work fine) 3) enabling broadcast of ssid on the ap :)

Sorry if my post isn't very useful in resolving the issue, but i've been trying for 3 days without significant results...i've found this ticket just this evening :S

Thanks to all for support,

Fabio Sangiovanni

07/04/06 12:19:39 changed by kelmo

active-scan-20060514.patch was applied to r1664. Leaving open for comments, if any for the next few days.

07/04/06 12:26:36 changed by kelmo

  • milestone changed from version 1.0.0 - first stable release to version 0.9.2.

07/20/06 08:14:17 changed by mrenzmann

  • status changed from assigned to closed.
  • resolution set to fixed.

No comments yet, so I'm closing this ticket.

07/26/06 02:13:09 changed by anonymous

  • status changed from closed to reopened.
  • resolution deleted.

r1664 seems to break authentication. I'm trying to get an STA to associate with an AP using 64bit WEP (both with wpa_supplicant and by simply manually setting the key - both fail). Reverting out the patch that r1664 applied works fine

07/26/06 06:37:09 changed by mrenzmann

  • priority changed from minor to blocker.

Can you confirm that the authentication problems you've seen still exist in vanilla r1692? Please reply ASAP, since your report is a blocker for release 0.9.2 that is scheduled for tomorrow. Thanks in advance.

07/26/06 11:58:36 changed by kelmo

  • status changed from reopened to closed.
  • resolution set to worksforme.

I cannot reproduce the described situation with r1690, WEP 64/128, essid broadcast or hidden, with wireless-tools or wpa_supplicant on my hardware (access point is wrt54g).

I have however seen problems with other access points and WEP, and these may be exposed by this patch in some way. Please may I suggest looking at the other tickets (#757, #756, #698, #657, #651, #454 and #428) for similarities, or failing that, create a new ticket describing your problem in more detail.

08/03/06 06:13:42 changed by thinkpad X60

  • status changed from closed to reopened.
  • resolution deleted.

Hello, I use thinkpad X60 'AR5006EX' on SUSE10.1. I can't connect to stealth AccessPoint? too. Case, setting AccessPoint? to 'NOT stealth', connection is success, But setting AccessPoint? to 'stealth' connection is fail.

How Can I connect to Accesspoint stealth mode ?

thank you

08/08/06 13:31:21 changed by kelmo

  • status changed from reopened to closed.
  • resolution set to worksforme.

Hi, this ticket discussed a problem, and was closed after the attached patch was committed, The patch has improved the situation for many people. If you are still experiencing problems, please grace us with more information about that problem, and the setting of the ap etc on a new ticket.

07/26/07 11:01:06 changed by kelmo

  • status changed from closed to reopened.
  • resolution deleted.

It seems there are more people that believe the preempt_scan logic introduced by r1664 is causing problems such as failure to associate with various different access points.

Both of these bugs describe that the removal of preempt_scan fixes their problem (iirc, a few others have mentioned it also elsewhere), specifically when using the "wext" backend of wpa_supplicant, which is also the backbone of the popular "Network-Manager".

http://bugs.debian.org/434527 http://bugs.debian.org/408207

Reopening the ticket, and requesting comment from anyone whether or not this preempt scan logic should be removed. It does seem like a stop-gap for a more comprehensive solution.

07/26/07 14:33:17 changed by mentor

  • priority changed from blocker to major.

07/26/07 17:50:07 changed by thully@umich.edu

I'm having a significant amount of trouble with preempt_scan. In particular, with it used in the driver, I am completely unable to associate with my unencrypted WRT54G, no matter what I do. This is basically a completely stock setup. It sees the AP, and will attempt to connect, but the lights in the NetworkManager panel applet never turn green. I'm using the 5424 chipset on a MacBook running Debian unstable, but this seems to happen whenever the "wext" backend of wpa_supplicant is used with NetworkManager.

If I patch the MadWifi source to remove preempt_scan, everything works fine. Also, if I change to using the "madwifi" backend of wpa_supplicant, everything works fine. The latter is why you don't see any issues with Ubuntu and some other distributions - they automatically use the "madwifi" backend of wpa_supplicant for Atheros chipsets. Could preempt_scan be removed until we find a better solution?

07/27/07 10:59:30 changed by mrenzmann

  • attachment backout-preempt-scan.diff added.

Proposed patch to back out preempt_scan from current trunk

07/27/07 11:01:53 changed by mrenzmann

Attached patch should be sufficient to back out the preempt_scan related changes from trunk (r2619). However, I'm not sure if it is complete and/or entirely correct, thus it should be reviewed by others before it gets committed (if we decide to do so).

07/27/07 11:17:41 changed by mrenzmann

A snapshot tarball of r2619 plus backout-preempt-scan.diff applied is available for download here. This might help interested parties to test whether the patch actually helps to fix their association issues.

08/13/07 13:55:57 changed by kelmo

The tarball based on r2619 is a dud, it causes oops upon module load (that was fixed in later revisions). Please don't use it, it may even be removed.

08/13/07 14:21:33 changed by mrenzmann

The r2619-based tarball has now been replaced by one based on r2651; download it here.

08/16/07 16:47:45 changed by anonymous

The patched version works great for me - it fixes my association problems...

08/16/07 19:05:58 changed by mrenzmann

I've started a discussion about this on madwifi-devel.