Please note: This project is no longer active. The website is kept online for historic purposes only.
If you´re looking for a Linux driver for your Atheros WLAN device, you should continue here .

Ticket #835 (closed defect: fixed)

Opened 13 years ago

Last modified 11 years ago

Kernel Oops after recreating VAP

Reported by: matt@mattb.net.nz Assigned to:
Priority: major Milestone: version 0.9.5
Component: madwifi: 802.11 stack Version: v0.9.2
Keywords: Cc:
Patch is attached: 1 Pending:

Description

There is a bug in the driver that causes an oops when a vap is destroyed and recreated with the same settings, provided that there is a remote host sending packets to the device before and after it is destroyed and recreated.

The bug occurs because a mapping in the driver is not correctly cleared when a VAP is destroyed. When the vap is recreated the first incoming packet is handled using the mapping which now contains a stale record pointing to an invalid VAP structure. When ieee80211_input tries to dereference this vap structure an oops occurs as shown in the first attachment.

The mapping in question is called sc_keyixmap and is meant to be cleared when a node is removed via the ath_key_delete routine which is called from the vap via the iv_key_delete method. This routine is meant to be triggered as the node objects are destroyed in the vap detach code starting in ieee80211_node_vdetach, however a reference counting error means that the nodes are never actually destroyed and hence ath_key_delete is never called to remove the entry from the mapping.

The reference counting error occurs due to the mapping itself, when it is setup in ath_rx_tasklet (line 5589) it takes another reference to the node object. This reference is still held when the ieee80211_node_vdetach routine calls ieee80211_node_table_reset to free all node entries related to the vap. ieee80211_node_table_reset uses node_reclaim to attempt to remove the node, however this function never completes because of the reference held by the mapping. Because the mapping relies on the node object being deleted (and hence ath_key_delete being called) to be cleaned up, we end up in a situation where the node object is never removed and the mapping becomes stale when the VAP is recreated.

The solution is call _ieee80211_free_node instead of node_reclaim to remove the node entry. This begins the node removal process, ignoring the reference held by the mapping, and eventually ath_key_delete is called to clean things up properly. No other parts of the code should be holding a reference to the node at this time.

A patch is attached to implement this behaviour.

Ticket #222 is almost certainly a duplicate of this bug and will be fixed with the attached patch, the oops provided is different however, so I cannot verify 100% that it is the same issue.

Tickets #656 and #733 also sound very similar and should be checked after this patch has been applied to see if they are fixed too.

Signed-Off-By: Matt Brown <matt@mattb.net.nz>

Attachments

vap-recreate-oops (1.3 kB) - added by matt@mattb.net.nz on 08/21/06 14:46:19.
oops demonstrating the problem
keyix-reset.diff (394 bytes) - added by matt@mattb.net.nz on 08/21/06 14:47:35.
Properly remove nodes from the table when destroying a VAP
keyix-reset-conservative.diff (1.1 kB) - added by matt@mattb.net.nz on 08/21/06 14:58:43.
Conservative (but untidier) approach to solving the problem that doesn't ignore refcounter
keyix-reset-nodeleave.diff (1.5 kB) - added by matt@mattb.net.nz on 08/22/06 07:03:47.
Fix bug by calling node_leave

Change History

08/21/06 14:46:19 changed by matt@mattb.net.nz

  • attachment vap-recreate-oops added.

oops demonstrating the problem

08/21/06 14:47:35 changed by matt@mattb.net.nz

  • attachment keyix-reset.diff added.

Properly remove nodes from the table when destroying a VAP

08/21/06 14:57:59 changed by matt@mattb.net.nz

Ignoring the reference counter is not particularly ideal behaviour. An alternative approach is to remove the reference to the node object held by the map and then call reclaim_node as per usual.

This results in slightly messier code. I think the original patch is fine is this case as no other references are held to the node object. BUT I have also attached a patch that implements the more conservative approach for good measure.

08/21/06 14:58:43 changed by matt@mattb.net.nz

  • attachment keyix-reset-conservative.diff added.

Conservative (but untidier) approach to solving the problem that doesn't ignore refcounter

08/22/06 02:34:46 changed by mentor

Yay. Another oops bites the dust.

Sadly, I'm not convinced that the first patch is really the solution (and yes the second patch is fugly). IMO, The reference counting should be managing the allocation and deallocation of node structs/memory as and when references are added removed. Cleaning up state that references the node in the node_cleanup (when the reference count is supposed to have fallen to zero) is... broken. For instance, we also hold references to packets in the node SAVEQ (Power Saving) which is also cleaned up in node_cleanup.

08/22/06 02:35:36 changed by mentor

We should probably be doing this is node_leave (and there is a big XXX about this in there too).

08/22/06 07:03:14 changed by matt@mattb.net.nz

Yes, you're right, both the previous implementations were not optimum. How about this new patch, it alters the node table reset/clear functions to call node_leave on each node that is being removed. This then takes clear or deallocating resources associated with the node and removing the entry from the node table.

08/22/06 07:03:47 changed by matt@mattb.net.nz

  • attachment keyix-reset-nodeleave.diff added.

Fix bug by calling node_leave

08/22/06 19:23:59 changed by mentor

Yes. Node cleanup just needs to be fixed now. Currently, we are calling node_cleanup from free_node. IMO, and from the comments of node_cleanup, it appears that node_cleanup should be called, and if necessary, then free_node will deallocate the node automatically afterwards.

01/16/07 18:57:48 changed by mark@glines.org

Quick note, it looks like the #907 patch incorporates some of the changes from keyix-reset-nodeleave.diff.

Mark

07/26/07 05:22:24 changed by mentor

  • status changed from new to closed.
  • resolution set to fixed.
  • milestone set to version 0.9.4.

I believe this patch was fixed with the merge of the refcount branch (r2357) and closure of #907.

02/11/08 06:11:44 changed by mrenzmann

  • milestone changed from version 0.9.4 to version 0.9.5.