Please note: This project is no longer active. The website is kept online for historic purposes only.
If you´re looking for a Linux driver for your Atheros WLAN device, you should continue here .

Ticket #1698 (new defect)

Opened 14 years ago

Last modified 14 years ago

scheduling while atomic in mount.nfs4, followed by panic - only with madwifi

Reported by: madwifi-f@unternet.org Assigned to:
Priority: major Milestone:
Component: madwifi: driver Version: trunk
Keywords: Cc:
Patch is attached: 0 Pending:

Description

Lately I have been unable to use my madwifi-driven AR2413 with recent git kernels (2.6.24-rc2/3 and higher) as the system panics as soon as I try to access an automounted NFSv4 share. The panic is preceded by a string of 'scheduling while atomic' warnings which occur in mount.nfs4. I have checked with the NFSv4 list as to whether there is anyone else who has seen problems of this sort, with negative results.

The 'scheduling while atomic' warnings and the kernel panic only occur when I access the network using the madwifi-driven (AR2413) wifi-card; with the on board ethernet controller everything works as it should.

I have not yet managed to get an Oops out of the deceased kernel. I do have some relevant lines from dmesg though (see the end of this description).

Has anyone else seen these problems? Anyone using both NFSv4 as well as madwifi on recent Linus git kernels?

Here's what dmesg shows before the panic sets in...

BUG: scheduling while atomic: mount.nfs4/10052/0xfffffdff
Pid: 10052, comm: mount.nfs4 Tainted: P
2.6.24-rc5-t23-20071212-01 #6
 [schedule+565/736] schedule+0x235/0x2e0
 [<f0b54529>] rpc_wait_bit_interruptible+0x19/0x20 [sunrpc]
 [__wait_on_bit+66/112] __wait_on_bit+0x42/0x70
 [<f0b54510>] rpc_wait_bit_interruptible+0x0/0x20 [sunrpc]
 [<f0b54510>] rpc_wait_bit_interruptible+0x0/0x20 [sunrpc]
 [out_of_line_wait_on_bit+113/144] out_of_line_wait_on_bit+0x71/0x90
 [wake_bit_function+0/96] wake_bit_function+0x0/0x60
 [<f0b548ff>] __rpc_execute+0xaf/0x260 [sunrpc]
 [<f0b4df27>] rpc_do_run_task+0x67/0xc0 [sunrpc]
 [<f0b4dffb>] rpc_call_sync+0x1b/0x40 [sunrpc]
 [<f0b4e05d>] rpc_ping+0x3d/0x50 [sunrpc]
 [<f0b4f0be>] rpc_create+0x3ee/0x470 [sunrpc]
 [schedule+597/736] schedule+0x255/0x2e0
 [schedule_timeout+117/192] schedule_timeout+0x75/0xc0
 [wait_for_common+150/352] wait_for_common+0x96/0x160
 [<f0bbed8c>] nfs_get_client+0x5c/0x3a0 [nfs]
 [<f0bbeb04>] nfs_create_rpc_client+0xf4/0x190 [nfs]
 [<f0bbee33>] nfs_get_client+0x103/0x3a0 [nfs]
 [<f0bbf148>] nfs4_set_client+0x78/0x1a0 [nfs]
 [<f0bbf92f>] nfs4_create_server+0x5f/0x410 [nfs]
 [strndup_user+98/128] strndup_user+0x62/0x80
 [<f0bc8e69>] nfs4_get_sb+0x2f9/0x530 [nfs]
 [do_lookup+101/400] do_lookup+0x65/0x190
 [permission+106/272] permission+0x6a/0x110
 [dput+28/352] dput+0x1c/0x160
 [vfs_kern_mount+67/144] vfs_kern_mount+0x43/0x90
 [do_kern_mount+61/224] do_kern_mount+0x3d/0xe0
 [do_mount+1266/1728] do_mount+0x4f2/0x6c0
 [__alloc_pages+86/864] __alloc_pages+0x56/0x360
 [handle_mm_fault+647/1488] handle_mm_fault+0x287/0x5d0
 [__tcp_push_pending_frames+285/2192]
__tcp_push_pending_frames+0x11d/0x890
 [handle_level_irq+0/240] handle_level_irq+0x0/0xf0
 [irq_exit+71/112] irq_exit+0x47/0x70
 [do_IRQ+122/192] do_IRQ+0x7a/0xc0
 [common_interrupt+35/40] common_interrupt+0x23/0x28
 [xfrm_policy_kill+176/192] xfrm_policy_kill+0xb0/0xc0
 [copy_mount_options+194/336] copy_mount_options+0xc2/0x150
 [getname+179/224] getname+0xb3/0xe0
 [sys_mount+119/192] sys_mount+0x77/0xc0
 [sysenter_past_esp+95/133] sysenter_past_esp+0x5f/0x85
 [xfrm_policy_kill+176/192] xfrm_policy_kill+0xb0/0xc0
 =======================

(loads more of these, followed by the ominous 'kernel panic, not syncing... Killing interrupt handler, Aieee...)

Change History

01/07/08 14:19:08 changed by madwifi-f@unternet.org

This bug still lives in the latest svn + latest kernel git pull... Am I the only one seeing this?

(follow-up: ↓ 3 ) 01/18/08 23:36:21 changed by jonas.walther@bredband.net

I think I have the same problem running kernel 2.6.23.9-85.fc8 and madwifi binary package 0.9.4-40_r3123 from atrpms.net (I guess it's from changeset 3123). After rollback to 0.9.4-39_r2756 (changeset 2756) mount.nfs works again. Tried this 2 times and it's repeatable, mount.nfs works with 0.9.4-39_r2756 and hangs the kernel with 0.9.4-40_r3123.

Jan 14 19:37:15 dtpc kernel: BUG: scheduling while atomic: mount.nfs/0xfffffe00/2799
Jan 14 19:37:15 dtpc kernel:  [<c061d2f9>] __sched_text_start+0x79/0x638
Jan 14 19:37:15 dtpc kernel:  [<c0434a96>] lock_timer_base+0x19/0x35
Jan 14 19:37:15 dtpc kernel:  [<c0434baa>] __mod_timer+0x9a/0xa4
Jan 14 19:37:15 dtpc kernel:  [<f8c919a4>] xprt_timer+0x0/0x6f [sunrpc]
Jan 14 19:37:15 dtpc kernel:  [<f8c9492a>] rpc_sleep_on+0x1e3/0x1f1 [sunrpc]
Jan 14 19:37:15 dtpc kernel:  [<f8c9415d>] rpc_wait_bit_interruptible+0x1a/0x1f [sunrpc]
Jan 14 19:37:15 dtpc kernel:  [<c061e060>] __wait_on_bit+0x33/0x58
Jan 14 19:37:15 dtpc kernel:  [<f8c94143>] rpc_wait_bit_interruptible+0x0/0x1f [sunrpc]
Jan 14 19:37:15 dtpc kernel:  [<f8c94143>] rpc_wait_bit_interruptible+0x0/0x1f [sunrpc]
Jan 14 19:37:15 dtpc kernel:  [<c061e0e8>] out_of_line_wait_on_bit+0x63/0x6b
Jan 14 19:37:15 dtpc kernel:  [<c043d4ca>] wake_bit_function+0x0/0x3c
Jan 14 19:37:15 dtpc kernel:  [<f8c94586>] __rpc_execute+0xeb/0x21f [sunrpc]
Jan 14 19:37:15 dtpc kernel:  [<f8c940b8>] rpc_set_active+0x48/0x50 [sunrpc]
Jan 14 19:37:15 dtpc kernel:  [<f8c8fc1f>] rpc_do_run_task+0x76/0x8f [sunrpc]
Jan 14 19:37:15 dtpc kernel:  [<f8c8fcb9>] rpc_call_sync+0x21/0x39 [sunrpc]
Jan 14 19:37:15 dtpc kernel:  [<f8c8fd04>] rpc_ping+0x33/0x47 [sunrpc]
Jan 14 19:37:15 dtpc kernel:  [<f8c90965>] rpc_create+0x358/0x3d5 [sunrpc]
Jan 14 19:37:15 dtpc kernel:  [<f8d14972>] nfs_create_rpc_client+0x112/0x14d [nfs]
Jan 14 19:37:15 dtpc kernel:  [<f8d14c98>] nfs_get_client+0x1db/0x2e0 [nfs]
Jan 14 19:37:15 dtpc kernel:  [<f8d150e4>] nfs_create_server+0xe6/0x415 [nfs]
Jan 14 19:37:15 dtpc kernel:  [<c0434b06>] del_timer_sync+0xa/0x14
Jan 14 19:37:15 dtpc kernel:  [<c061df36>] schedule_timeout+0x79/0x8f
Jan 14 19:37:15 dtpc kernel:  [<c04256c8>] update_curr+0x13d/0x167
Jan 14 19:37:15 dtpc kernel:  [<c0425d57>] enqueue_entity+0x2dd/0x307
Jan 14 19:37:15 dtpc kernel:  [<c04257ab>] __check_preempt_curr_fair+0x55/0x86
Jan 14 19:37:15 dtpc kernel:  [<c04256c8>] update_curr+0x13d/0x167
Jan 14 19:37:15 dtpc kernel:  [<f8d1c398>] nfs_get_sb+0x513/0x73b [nfs]
Jan 14 19:37:15 dtpc kernel:  [<c04250a9>] update_stats_wait_end+0xd3/0xfe
Jan 14 19:37:15 dtpc kernel:  [<c04665b1>] get_page_from_freelist+0x25d/0x2db
Jan 14 19:37:15 dtpc kernel:  [<c0466693>] __alloc_pages+0x64/0x2a2
Jan 14 19:37:15 dtpc kernel:  [<c0493c86>] alloc_vfsmnt+0x86/0xac
Jan 14 19:37:15 dtpc kernel:  [<c0482af1>] vfs_kern_mount+0x83/0xfe
Jan 14 19:37:15 dtpc kernel:  [<c0482bb6>] do_kern_mount+0x35/0xbb
Jan 14 19:37:15 dtpc kernel:  [<c04946c1>] do_mount+0x5fb/0x65d
Jan 14 19:37:15 dtpc kernel:  [<c0423eb6>] kunmap_atomic+0x54/0x96
Jan 14 19:37:15 dtpc kernel:  [<c046f14f>] handle_mm_fault+0x76d/0x78b
Jan 14 19:37:15 dtpc kernel:  [<c04f6d68>] copy_to_user+0x34/0x48
Jan 14 19:37:15 dtpc kernel:  [<c05b2f89>] move_addr_to_user+0x51/0x69
Jan 14 19:37:15 dtpc kernel:  [<c0620652>] do_page_fault+0x2c0/0x5ef
Jan 14 19:37:15 dtpc kernel:  [<c061f07a>] error_code+0x72/0x78
Jan 14 19:37:15 dtpc kernel:  [<c06100d8>] secpath_dup+0x8/0x52
Jan 14 19:37:15 dtpc kernel:  [<c0493287>] copy_mount_options+0x90/0x109
Jan 14 19:37:15 dtpc kernel:  [<c049479a>] sys_mount+0x77/0xae
Jan 14 19:37:15 dtpc kernel:  [<c040518a>] syscall_call+0x7/0xb
Jan 14 19:37:15 dtpc kernel:  [<c0610000>] xfrm_state_sort+0x2e/0x5a
Jan 14 19:37:15 dtpc kernel:  =======================
Jan 14 19:37:15 dtpc kernel: BUG: sleeping function called from invalid context at include/asm/semaphore.h:99
Jan 14 19:37:15 dtpc kernel: in_atomic():1, irqs_disabled():0
Jan 14 19:37:15 dtpc kernel:  [<c061efdc>] __reacquire_kernel_lock+0x2a/0x4b
Jan 14 19:37:15 dtpc kernel:  [<c061d89e>] __sched_text_start+0x61e/0x638
Jan 14 19:37:15 dtpc kernel:  [<c0434a96>] lock_timer_base+0x19/0x35
Jan 14 19:37:15 dtpc kernel:  [<f8c9415d>] rpc_wait_bit_interruptible+0x1a/0x1f [sunrpc]
Jan 14 19:37:15 dtpc kernel:  [<c061e060>] __wait_on_bit+0x33/0x58
Jan 14 19:37:15 dtpc kernel:  [<f8c94143>] rpc_wait_bit_interruptible+0x0/0x1f [sunrpc]
Jan 14 19:37:15 dtpc kernel:  [<f8c94143>] rpc_wait_bit_interruptible+0x0/0x1f [sunrpc]
Jan 14 19:37:15 dtpc kernel:  [<c061e0e8>] out_of_line_wait_on_bit+0x63/0x6b
Jan 14 19:37:15 dtpc kernel:  [<c043d4ca>] wake_bit_function+0x0/0x3c
Jan 14 19:37:15 dtpc kernel:  [<f8c94586>] __rpc_execute+0xeb/0x21f [sunrpc]
Jan 14 19:37:15 dtpc kernel:  [<f8c940b8>] rpc_set_active+0x48/0x50 [sunrpc]
Jan 14 19:37:15 dtpc kernel:  [<f8c8fc1f>] rpc_do_run_task+0x76/0x8f [sunrpc]

(in reply to: ↑ 2 ) 01/19/08 21:04:36 changed by madwifi-f@unternet.org

Replying to jonas.walther@bredband.net:

I think I have the same problem running kernel 2.6.23.9-85.fc8 and madwifi binary package 0.9.4-40_r3123 from atrpms.net (I guess it's from changeset 3123). After rollback to 0.9.4-39_r2756 (changeset 2756) mount.nfs works again. Tried this 2 times and it's repeatable, mount.nfs works with 0.9.4-39_r2756 and hangs the kernel with 0.9.4-40_r3123.

OK, I narrowed it down to the range 2902-2913:

Up to 2901: compiles and works OK in latest kernel git

2902 - 2912: does not compile in latest kernel git

2913-CURRENT: compiles but hangs kernel as described above

Something within this range of patches (2902-2913) gives rise to these crashes. As to what it is I don't know yet. For now my advise to those who use a recent git kernel AND make use of nfs/rpc is to checkout release 2901 (svn co -r 2901 ...) and to refrain from updating until these problems have been solved...)

(follow-up: ↓ 5 ) 01/21/08 20:30:42 changed by Przemyslaw Bruski

The problem seems to be with 2902 - I've compiled 2902 with the fix from 2913 and the problem appeared.

(in reply to: ↑ 4 ) 01/22/08 12:54:08 changed by anonymous

Replying to Przemyslaw Bruski:

The problem seems to be with 2902 - I've compiled 2902 with the fix from 2913 and the problem appeared.

Very likely as 2901-2902 is a large change (diff -u gives 147164 bytes). Changes between 2902-2913 are limited to small bugfixes. Have not looked further yet for lack of time.

01/22/08 12:54:41 changed by madwifi-f@unternet.org

Last message was mine, forgot to sign it...

02/12/08 19:24:05 changed by Przemyslaw Bruski

I had this problem with the following configuration: NFSv3 mount, madwifi@2902 and kernel 2.6.23.12. With madwifi@3349 and 2.6.24.2 it is gone - hopefully forever. I recommend to close the defect as (magically) fixed.

02/26/08 22:55:47 changed by madwifi-f@unternet.org

Tested with svn r3365 & 2.6.25-rc3 seems to indicate that this problem has indeed been solved. It would have been interesting to find the root cause of it but as it 'just works' now I won't spend any more time on this. Case closed... for now...