Please note: This project is no longer active. The website is kept online for historic purposes only.
If you´re looking for a Linux driver for your Atheros WLAN device, you should continue here .

Ticket #1239 (closed defect: fixed)

Opened 12 years ago

Last modified 11 years ago

Soft lockup

Reported by: kelmo Assigned to:
Priority: major Milestone: version 0.9.5
Component: madwifi: driver Version: trunk
Keywords: soft lockup Cc:
Patch is attached: 0 Pending:

Description

After updating madwifi from r2210 from trunk, to r2249 from branchs/madwifi-hal-0.9.30.10 I experienced the following soft lockup immediately after plugging in WG511T cardbus card, which in turn started wpa_supplicant and attempted to make a connection:

ath_hal: 0.9.30.10 (AR5210, AR5211, AR5212, AR5416, RF5111, RF5112, RF2413, RF5413, RF2133)
wlan: 0.8.4.2 (svn r2249)
ath_pci: 0.9.4.5 (svn r2249)
ACPI: PCI Interrupt 0000:03:00.0[A] -> Link [LNKC] -> GSI 10 (level, low) -> IRQ 10
ath_rate_sample: 1.2 (svn r2249)
wifi0: 11b rates: 1Mbps 2Mbps 5.5Mbps 11Mbps
wifi0: 11g rates: 1Mbps 2Mbps 5.5Mbps 11Mbps 6Mbps 9Mbps 12Mbps 18Mbps 24Mbps 36Mbps 48Mbps 54Mbps
wifi0: turboG rates: 6Mbps 12Mbps 18Mbps 24Mbps 36Mbps 48Mbps 54Mbps
wifi0: H/W encryption support: WEP AES AES_CCM TKIP
wifi0: mac 7.9 phy 4.5 radio 5.6
wifi0: Use hw queue 1 for WME_AC_BE traffic
wifi0: Use hw queue 0 for WME_AC_BK traffic
wifi0: Use hw queue 2 for WME_AC_VI traffic
wifi0: Use hw queue 3 for WME_AC_VO traffic
wifi0: Use hw queue 8 for CAB traffic
wifi0: Use hw queue 9 for beacons
wifi0: Atheros 5212: mem=0xd4000000, irq=10
BUG: soft lockup detected on CPU#0!
 [<b01212a8>] update_process_times+0x28/0x70
 [<b01066d5>] timer_interrupt+0x45/0x80
 [<b0138ac5>] handle_IRQ_event+0x25/0x60
 [<b0139f34>] handle_level_irq+0x54/0xc0
 [<b0105560>] do_IRQ+0x40/0x80
 [<b0106705>] timer_interrupt+0x75/0x80
 [<b0138ac5>] handle_IRQ_event+0x25/0x60
 [<b01038db>] common_interrupt+0x23/0x28
 [<f20117e9>] zz0dc6a5c5+0x11/0x44 [ath_hal]
 [<f2017a11>] zz005b60b2+0x5e1/0x800 [ath_hal]
 [<f2017ef1>] zz05b8c65a+0xdd/0xe4 [ath_hal]
 [<f0b682ee>] ath_intr+0x20e/0xc00 [ath_pci]
 [<b0106705>] timer_interrupt+0x75/0x80
 [<b0138ac5>] handle_IRQ_event+0x25/0x60
 [<b0139f61>] handle_level_irq+0x81/0xc0
 [<b0105565>] do_IRQ+0x45/0x80
 [<f08b0da9>] rtl8139_interrupt+0x29/0x480 [8139too]
 [<b01038db>] common_interrupt+0x23/0x28
 [<b0138ac5>] handle_IRQ_event+0x25/0x60
 [<b0139f34>] handle_level_irq+0x54/0xc0
 [<b0105560>] do_IRQ+0x40/0x80
 [<b01038db>] common_interrupt+0x23/0x28
 [<f0b2bffc>] ieee80211_getmgtframe+0xac/0xd0 [wlan]
 [<f0b1e8c2>] ieee80211_send_probereq+0x62/0x2c0 [wlan]
 [<b01038db>] common_interrupt+0x23/0x28
 [<f0b2351e>] scan_next+0x46e/0x490 [wlan]
 [<b01038db>] common_interrupt+0x23/0x28
 [<f0b230b4>] scan_next+0x4/0x490 [wlan]
 [<f0b230b0>] scan_next+0x0/0x490 [wlan]
 [<b0120811>] run_timer_softirq+0x101/0x150
 [<b0116a02>] scheduler_tick+0x12/0x70
 [<b011d662>] __do_softirq+0x42/0x90
 [<b011d6d6>] do_softirq+0x26/0x30
 [<b0105565>] do_IRQ+0x45/0x80
 [<b01e83f6>] acpi_hw_register_write+0x154/0x187
 [<b01038db>] common_interrupt+0x23/0x28
 [<f0817e41>] acpi_processor_idle+0x1af/0x342 [processor]
 [<b01013da>] cpu_idle+0x3a/0x50
 [<b0320733>] start_kernel+0x293/0x350
 [<b0320250>] unknown_bootoption+0x0/0x250
 =======================

Kernel 2.6.20.4, uniprocessor on x86.

Thanks, Kel

Change History

04/04/07 11:55:16 changed by Ċ½ilvinas Valinskas <valins@soften.ktu.lt>

Lovely ;) It means that for last 10s soft-lockup thread was not running. This happens on high load, for example run any benchmark to measure throughput, have 11a/Turbo/FastFrames & and the rest of whistles and bells enabled. Wait and you will get this message all the time.

Now lets digest the backtrace:

BUG: soft lockup detected on CPU#0!
 [<b01212a8>] update_process_times+0x28/0x70
 [<b01066d5>] timer_interrupt+0x45/0x80
 [<b0138ac5>] handle_IRQ_event+0x25/0x60
 [<b0139f34>] handle_level_irq+0x54/0xc0
 [<b0105560>] do_IRQ+0x40/0x80        ### 4 ###
 [<b0106705>] timer_interrupt+0x75/0x80
 [<b0138ac5>] handle_IRQ_event+0x25/0x60
 [<b01038db>] common_interrupt+0x23/0x28
 [<f20117e9>] zz0dc6a5c5+0x11/0x44 [ath_hal]
 [<f2017a11>] zz005b60b2+0x5e1/0x800 [ath_hal]
 [<f2017ef1>] zz05b8c65a+0xdd/0xe4 [ath_hal]
 [<f0b682ee>] ath_intr+0x20e/0xc00 [ath_pci]
 [<b0106705>] timer_interrupt+0x75/0x80
 [<b0138ac5>] handle_IRQ_event+0x25/0x60
 [<b0139f61>] handle_level_irq+0x81/0xc0

 [<b0105565>] do_IRQ+0x45/0x80        ### 3 ###

 [<f08b0da9>] rtl8139_interrupt+0x29/0x480 [8139too]
 [<b01038db>] common_interrupt+0x23/0x28
 [<b0138ac5>] handle_IRQ_event+0x25/0x60
 [<b0139f34>] handle_level_irq+0x54/0xc0

 [<b0105560>] do_IRQ+0x40/0x80         ### 2 ###

 [<b01038db>] common_interrupt+0x23/0x28
 [<f0b2bffc>] ieee80211_getmgtframe+0xac/0xd0 [wlan]
 [<f0b1e8c2>] ieee80211_send_probereq+0x62/0x2c0 [wlan]
 [<b01038db>] common_interrupt+0x23/0x28
 [<f0b2351e>] scan_next+0x46e/0x490 [wlan]
 [<b01038db>] common_interrupt+0x23/0x28
 [<f0b230b4>] scan_next+0x4/0x490 [wlan]
 [<f0b230b0>] scan_next+0x0/0x490 [wlan]
 [<b0120811>] run_timer_softirq+0x101/0x150
 [<b0116a02>] scheduler_tick+0x12/0x70
 [<b011d662>] __do_softirq+0x42/0x90
 [<b011d6d6>] do_softirq+0x26/0x30
 [<b0105565>] do_IRQ+0x45/0x80              ### 1 ###

 [<b01e83f6>] acpi_hw_register_write+0x154/0x187
 [<b01038db>] common_interrupt+0x23/0x28
 [<f0817e41>] acpi_processor_idle+0x1af/0x342 [processor]
 [<b01013da>] cpu_idle+0x3a/0x50
 [<b0320733>] start_kernel+0x293/0x350
 [<b0320250>] unknown_bootoption+0x0/0x250
 =======================

Processor interrupt (by ACPI?) at !### 1 !####, handles ACPI stuff and on returns runs softirqs (do_softirq() function). That runs some timer, something related with air scan. Then somewhere in the middle rtl8139() is generating interrupt and it's interrupt handler is again interrupted but this time by atheros (ath_intr()), while handler is in HAL, executing cute zzz*() functions, timer intterupt comes in .... Oh well, my head is spinning already.

The real problem is that atheros, ethernet creates interrupt storm and timers are executed very very late (latencies ? due to interrupts). So your computer is so busy handling interrupts and each interrupt is interrupting other handlers ... allows system to make little or no progress.

As each interrupt handler schedules tasklets to do the work, those tasklets are called up on exit from interrupt handlers see irq_exit():

irq_exit() ->
    invoke_softirq() ->
          do_softirq() ->
             __do_softirq() <--- check out the code, comment and about 10x ;) constant

04/04/07 12:22:53 changed by Ċ½ilvinas Valinskas <valins@soften.ktu.lt>

See #900 for example (on Xscale/IXP425/BE platform).

05/21/07 20:48:32 changed by mentor

  • version changed from madwifi-hal-0.9.30.10 branch to trunk.

hal-0.9.30.13 has been merged to trunk, so I'm retargetting this.

06/21/07 11:47:11 changed by Axel.Thimm@ATrpms.net

A similar bug happens on revision 2431 on kernel 2.6.21 (as shipped in Fedora 7). Could you please check out whether http ://bugzilla.atrpms.net/show_bug.cgi?id=1233 (URL obfuscated, remove extra space) is the same as this one?

/var/log/messages are under http ://bugzilla.atrpms.net/attachment.cgi?id=126 (URL obfuscated, remove extra space)

I cannot reproduce this bug locally.

P.S. I can't edit the Cc, could you please add Axel.Thimm at ATrpms dot net and andrew.hoff at bigpond dot com to the Cc? Or any chance I could get a login id (that way i could also post sane URLs :)? Thanks!

06/22/07 18:00:58 changed by mentor

  • summary changed from soft lockup with HAL 0.9.30.10 to Soft lockup.

06/27/07 11:44:33 changed by vampire@vmp.isa-geek.org

I have a similar problem (svn2523). The lockups begin when i start airodump-ng on another virtual interface with monitor mode. Destroying the default ath0 interface seems to have stopped the lockups..

07/02/07 12:36:39 changed by anonymous

dmesg | grep -i svn [ 24.356000] wlan: 0.8.4.2 (svn r2525) [ 24.416000] ath_pci: 0.9.4.5 (svn r2525) [ 25.876000] ath_rate_sample: 1.2 (svn r2525)

[ 431.076000] BUG: soft lockup detected on CPU#0! [ 431.076000] [<c015354c>] softlockup_tick+0x9c/0xf0 [ 431.076000] [<c0130643>] update_process_times+0x33/0x80 [ 431.076000] [<c0106c45>] timer_interrupt+0x85/0xb0 [ 431.076000] [<c01538d0>] handle_IRQ_event+0x30/0x60 [ 431.076000] [<c015517d>] handle_level_irq+0x8d/0x120 [ 431.076000] [<c0105b70>] do_IRQ+0x40/0x80 [ 431.076000] [<c0104233>] common_interrupt+0x23/0x30 [ 431.076000] [<c0173189>] kmalloc+0x19/0xa0 [ 431.076000] [<c027e963>] alloc_skb+0x53/0x120 [ 431.076000] [<e0ba5f97>] ieee80211_getmgtframe+0x27/0xd0 [wlan] [ 431.076000] [<e0b96808>] ieee80211_send_probereq+0x58/0x2b0 [wlan] [ 431.076000] [<e0b9d6d7>] scan_next+0x447/0x4c0 [wlan] [ 431.076000] [<c011f9c4>] task_running_tick+0xd4/0x260 [ 431.076000] [<e0b9d290>] scan_next+0x0/0x4c0 [wlan] [ 431.076000] [<c012f74f>] run_timer_softirq+0x12f/0x1a0 [ 431.076000] [<c012b422>] do_softirq+0x82/0x100 [ 431.076000] [<c012b4f5>] do_softirq+0x55/0x60 [ 431.076000] [<c0105b75>] do_IRQ+0x45/0x80 [ 431.076000] [<c0104233>] common_interrupt+0x23/0x30 [ 431.076000] ======================= [ 1060.808000] BUG: soft lockup detected on CPU#0! [ 1060.808000] [<c015354c>] softlockup_tick+0x9c/0xf0 [ 1060.808000] [<c0130643>] update_process_times+0x33/0x80 [ 1060.808000] [<c0106c45>] timer_interrupt+0x85/0xb0 [ 1060.808000] [<c01538d0>] handle_IRQ_event+0x30/0x60 [ 1060.808000] [<c015517d>] handle_level_irq+0x8d/0x120 [ 1060.808000] [<c0105b70>] do_IRQ+0x40/0x80 [ 1060.808000] [<c0104233>] common_interrupt+0x23/0x30 [ 1060.808000] [<c02ee96d>] _spin_unlock_irqrestore+0xd/0x20 [ 1060.808000] [<e0bcb957>] ath_intr+0x177/0x1060 [ath_pci] [ 1060.808000] [<c0105b75>] do_IRQ+0x45/0x80 [ 1060.808000] [<c0106c45>] timer_interrupt+0x85/0xb0 [ 1060.808000] [<c01538d0>] handle_IRQ_event+0x30/0x60 [ 1060.808000] [<c0104233>] common_interrupt+0x23/0x30 [ 1060.808000] [<e0b4204c>] igetdword+0xc/0x20 [snd_intel8x0] [ 1060.808000] [<e0b432d7>] snd_intel8x0_interrupt+0x1f7/0x240 [snd_intel8x0] [ 1060.808000] [<c01538d0>] handle_IRQ_event+0x30/0x60 [ 1060.808000] [<c015517d>] handle_level_irq+0x8d/0x120 [ 1060.808000] [<c0105b70>] do_IRQ+0x40/0x80 [ 1060.808000] [<c0104233>] common_interrupt+0x23/0x30 [ 1060.808000] [<c01538b0>] handle_IRQ_event+0x10/0x60 [ 1060.808000] [<c015517d>] handle_level_irq+0x8d/0x120 [ 1060.808000] [<c0105b70>] do_IRQ+0x40/0x80 [ 1060.808000] [<c0104233>] common_interrupt+0x23/0x30 [ 1060.808000] [<c027ea19>] alloc_skb+0x109/0x120 [ 1060.808000] [<e0ba5f97>] ieee80211_getmgtframe+0x27/0xd0 [wlan] [ 1060.808000] [<e0b96808>] ieee80211_send_probereq+0x58/0x2b0 [wlan] [ 1060.808000] [<e0b9d6d7>] scan_next+0x447/0x4c0 [wlan] [ 1060.808000] [<e0b9d290>] scan_next+0x0/0x4c0 [wlan] [ 1060.808000] [<c012f74f>] run_timer_softirq+0x12f/0x1a0 [ 1060.808000] [<c012b422>] do_softirq+0x82/0x100 [ 1060.808000] [<c012b4f5>] do_softirq+0x55/0x60 [ 1060.808000] [<c0105b75>] do_IRQ+0x45/0x80 [ 1060.808000] [<c0219e3e>] acpi_hw_register_write+0x154/0x187 [ 1060.808000] [<c0104233>] common_interrupt+0x23/0x30 [ 1060.808000] [<e08317e2>] acpi_processor_idle+0x225/0x3f7 [processor] [ 1060.808000] [<c0101409>] cpu_idle+0x49/0xd0 [ 1060.808000] [<c03d97f5>] start_kernel+0x365/0x420 [ 1060.808000] [<c03d9230>] unknown_bootoption+0x0/0x260 [ 1060.808000] =======================

happens from time to time, is very annoying

07/03/07 06:54:33 changed by mrenzmann

Please resubmit that and enclose it with curly brackets so that the paste will be shown correctly. Thanks.

07/04/07 07:25:11 changed by mark@hindley.org.uk

I have just upgraded to svn2537, kernel 2.6.21.5 and have suddenly got soft lockups. Looks like the same problem.

Mark

Jul  4 06:15:47 mercury kernel: BUG: soft lockup detected on CPU#0!
Jul  4 06:15:48 mercury kernel:  [<c0137746>] softlockup_tick+0x92/0xaf
Jul  4 06:15:48 mercury kernel:  [<c011ee3e>] update_process_times+0x33/0x55
Jul  4 06:15:48 mercury kernel:  [<c012cc2d>] tick_nohz_handler+0x72/0xd2
Jul  4 06:15:48 mercury kernel:  [<c010edea>] smp_apic_timer_interrupt+0x5c/0x6c
Jul  4 06:15:48 mercury kernel:  [<cca1e9a1>] zz005b60b2+0x5e1/0x800 [ath_hal]
Jul  4 06:15:48 mercury kernel:  [<c0104688>] apic_timer_interrupt+0x28/0x30
Jul  4 06:15:48 mercury kernel:  [<cc9f8eec>] ath_intr+0xdd8/0xe40 [ath_pci]
Jul  4 06:15:48 mercury kernel:  [<c012cc82>] tick_nohz_handler+0xc7/0xd2
Jul  4 06:15:48 mercury kernel:  [<c010edef>] smp_apic_timer_interrupt+0x61/0x6c
Jul  4 06:15:48 mercury kernel:  [<c0104688>] apic_timer_interrupt+0x28/0x30
Jul  4 06:15:48 mercury kernel:  [<c01379c2>] handle_IRQ_event+0x1a/0x3f
Jul  4 06:15:48 mercury kernel:  [<c0138c1a>] handle_level_irq+0x93/0xec
Jul  4 06:15:48 mercury kernel:  [<c01061d6>] do_IRQ+0x57/0x71
Jul  4 06:15:48 mercury kernel:  [<c010465b>] common_interrupt+0x23/0x28
Jul  4 06:15:48 mercury kernel:  [<c01379b9>] handle_IRQ_event+0x11/0x3f
Jul  4 06:15:48 mercury kernel:  [<c0138c1a>] handle_level_irq+0x93/0xec
Jul  4 06:15:48 mercury kernel:  [<c01061d6>] do_IRQ+0x57/0x71
Jul  4 06:15:48 mercury kernel:  [<c010465b>] common_interrupt+0x23/0x28
Jul  4 06:15:48 mercury kernel:  [<c01379b9>] handle_IRQ_event+0x11/0x3f
Jul  4 06:15:48 mercury kernel:  [<c0138c1a>] handle_level_irq+0x93/0xec
Jul  4 06:15:48 mercury kernel:  [<c01061d6>] do_IRQ+0x57/0x71
Jul  4 06:15:48 mercury kernel:  [<c010465b>] common_interrupt+0x23/0x28
Jul  4 06:15:48 mercury kernel:  [<c015073d>] __kmalloc+0x49/0x51
Jul  4 06:15:48 mercury kernel:  [<c0227d87>] __alloc_skb+0x49/0xf7
Jul  4 06:15:48 mercury kernel:  [<cca5d5a6>] ieee80211_getmgtframe+0x27/0xc6 [wlan]
Jul  4 06:15:48 mercury kernel:  [<cca50a0e>] ieee80211_send_probereq+0x9d/0x2a7 [wlan]
Jul  4 06:15:48 mercury kernel:  [<cc9f2b5d>] ath_tx_capture+0xe9/0x1ce [ath_pci]
Jul  4 06:15:48 mercury kernel:  [<cca55e6e>] scan_next+0x20e/0x47b [wlan]
Jul  4 06:15:48 mercury kernel:  [<c011e0f7>] getnstimeofday+0x2b/0xac
Jul  4 06:15:48 mercury kernel:  [<cca55c60>] scan_next+0x0/0x47b [wlan]
Jul  4 06:15:48 mercury kernel:  [<c011e4e2>] run_timer_softirq+0x102/0x176
Jul  4 06:15:48 mercury kernel:  [<c012c988>] tick_program_event+0x2a/0x49
Jul  4 06:15:48 mercury kernel:  [<c011b5b9>] __do_softirq+0x35/0x75
Jul  4 06:15:48 mercury kernel:  [<c011b61b>] do_softirq+0x22/0x26
Jul  4 06:15:48 mercury kernel:  [<c011b7e2>] irq_exit+0x29/0x62
Jul  4 06:15:48 mercury kernel:  [<c010edef>] smp_apic_timer_interrupt+0x61/0x6c
Jul  4 06:15:48 mercury kernel:  [<c0104688>] apic_timer_interrupt+0x28/0x30
Jul  4 06:15:48 mercury kernel:  =======================

08/03/07 11:05:05 changed by nbd

  • status changed from new to closed.
  • resolution set to fixed.
  • milestone set to version 0.9.4.

should be fixed in [2632]

02/11/08 06:17:04 changed by mrenzmann

  • milestone changed from version 0.9.4 to version 0.9.5.