Modify

Opened 3 years ago

Closed 3 years ago

#18497 closed defect (duplicate)

ath9k crashes (IRQ not handled) on WNDR3700

Reported by: mroek Owned by: developers
Priority: high Milestone:
Component: kernel Version: Trunk
Keywords: Cc:

Description

On my WNDR3700 running trunk r43530, some clients suddenly get really lousy throughput on Wifi. Checking the log, I find this:

[27447.380000] irq 41: nobody cared (try booting with the "irqpoll" option)
[27447.380000] CPU: 0 PID: 0 Comm: swapper Not tainted 3.14.25 #2
[27447.380000] Stack : 00000006 ffffffff 00000000 00000000 00000000 00000000 803bc8ce 00000032
[27447.380000] 	  803453b0 00000000 802fb434 803457d7 00000000 803b3b5c 803453b0 00000000
[27447.380000] 	  80340000 802fe7b0 802fe7c4 8029dfbc 00000000 801ffd88 00000006 801a43c8
[27447.380000] 	  802fe5f8 80335b44 00000000 00000000 00000000 00000000 00000000 00000000
[27447.380000] 	  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[27447.380000] 	  ...
[27447.380000] Call Trace:
[27447.380000] [<80241e74>] show_stack+0x48/0x70
[27447.380000] [<800a6698>] __report_bad_irq.isra.7+0x44/0xf8
[27447.380000] [<801d4f24>] note_interrupt+0x224/0x2d8
[27447.380000] [<8015c79c>] handle_irq_event_percpu+0x1b8/0x1ec
[27447.380000] [<8015c5c0>] handle_irq_event+0x3c/0x60
[27447.380000] [<8015c8b0>] handle_level_irq+0xe0/0xf8
[27447.380000] [<8014ff20>] generic_handle_irq+0x28/0x44
[27447.380000] [<8014ff20>] generic_handle_irq+0x28/0x44
[27447.380000] [<80118ff0>] do_IRQ+0x1c/0x2c
[27447.380000] [<80060830>] ret_from_irq+0x0/0x4
[27447.380000] [<833a1274>] ath_start_rfkill_poll+0x10c/0x37c [ath9k]
[27447.380000] [<833a3d68>] ath9k_tasklet+0x214/0x230 [ath9k]
[27447.380000] [<80264f58>] tasklet_action+0x84/0xcc
[27447.380000] [<8008f5d8>] __do_softirq+0xf8/0x228
[27447.380000] [<80187184>] irq_exit+0x54/0x70
[27447.380000] [<80060830>] ret_from_irq+0x0/0x4
[27447.380000] [<80060a80>] __r4k_wait+0x20/0x40
[27447.380000] [<801008e4>] cpu_startup_entry+0xa4/0x104
[27447.380000] [<8035294c>] start_kernel+0x3c8/0x3e0
[27447.380000] 
[27447.380000] handlers:
[27447.380000] [<833a3864>] ath_isr [ath9k]
[27447.380000] Disabling IRQ #41
[27447.550000] ath: phy1: Failed to stop TX DMA, queues=0x100!
[41388.910000] irq 40: nobody cared (try booting with the "irqpoll" option)
[41388.910000] CPU: 0 PID: 13164 Comm: kworker/u2:2 Not tainted 3.14.25 #2
[41388.910000] Workqueue: phy0 ath_reset_work [ath9k]
[41388.910000] Stack : 00000001 ffffffff 82bb1200 802fb434 803484d0 00000028 00000000 00000000
[41388.910000] 	  80340000 802fe7b0 802fe7c4 8029dfbc 0000336c 802004d0 83054bd0 00000000
[41388.910000] 	  8030928c 83b7fafc 83b7fafc 8029dfbc 00000000 801fff98 00000006 00000000
[41388.910000] 	  00000005 80232310 00000000 00000000 00000000 00000000 00000000 00000000
[41388.910000] 	  70687930 00000000 00000000 00000000 00000000 00000000 8332d100 8332d400
[41388.910000] 	  ...
[41388.910000] Call Trace:
[41388.910000] [<80241e74>] show_stack+0x48/0x70
[41388.910000] [<800a6698>] __report_bad_irq.isra.7+0x44/0xf8
[41388.910000] [<801d4f24>] note_interrupt+0x224/0x2d8
[41388.910000] [<8015c79c>] handle_irq_event_percpu+0x1b8/0x1ec
[41388.910000] [<8015c5c0>] handle_irq_event+0x3c/0x60
[41388.910000] [<8015c8b0>] handle_level_irq+0xe0/0xf8
[41388.910000] [<8014ff20>] generic_handle_irq+0x28/0x44
[41388.910000] [<8014ff20>] generic_handle_irq+0x28/0x44
[41388.910000] [<80118ff0>] do_IRQ+0x1c/0x2c
[41388.910000] [<80060830>] ret_from_irq+0x0/0x4
[41388.910000] [<8008f56c>] __do_softirq+0x8c/0x228
[41388.910000] [<80187184>] irq_exit+0x54/0x70
[41388.910000] [<80060830>] ret_from_irq+0x0/0x4
[41388.910000] [<80208a90>] process_one_work+0x218/0x364
[41388.910000] [<802af7ec>] worker_thread+0x234/0x388
[41388.910000] [<801a3af8>] kthread+0xd8/0xe4
[41388.910000] [<80060878>] ret_from_kernel_thread+0x14/0x1c
[41388.910000] 
[41388.910000] handlers:
[41388.910000] [<833a3864>] ath_isr [ath9k]
[41388.910000] Disabling IRQ #40

I'm not sure why this happens, but it seems only a reboot fixes it (until it happens again, some hours later).

Attachments (0)

Change History (4)

comment:1 Changed 3 years ago by ambrosa

I've a similar problem in my TP-LINK W8970 (Lantiq -> XR200 -> TP-LINK TDW8970) with OpenWrt compiled myself, version CHAOS CALMER (Bleeding Edge, r43423)

This happen just after a reboot, after about 10 minutes.

[  589.616000] irq 144: nobody cared (try booting with the "irqpoll" option)
[  589.616000] CPU: 0 PID: 1333 Comm: hostapd Not tainted 3.14.18 #2
[  589.616000] Stack : 00000000 00000000 00000000 00000000 804bc8c6 00000035 82caf708 80455340
[  589.616000] 	  803ebbb4 8044f1ff 00000535 804b3afc 82caf708 80455340 804bbd10 80440000
[  589.616000] 	  80440000 80305ef0 00000000 8021f638 00000000 00000000 803eeb28 82c19a94
[  589.616000] 	  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[  589.616000] 	  00000000 00000000 00000000 00000000 00000000 00000000 00000000 82c19a20
[  589.616000] 	  ...
[  589.616000] Call Trace:
[  589.616000] [<8026d3c8>] show_stack+0x48/0x70
[  589.616000] [<80056598>] __report_bad_irq.isra.7+0x3c/0x104
[  589.616000] [<801e7364>] note_interrupt+0x2d4/0x3c8
[  589.616000] [<80147828>] handle_irq_event_percpu+0x214/0x260
[  589.616000] [<801475f0>] handle_irq_event+0x3c/0x60
[  589.616000] [<80147960>] handle_level_irq+0xec/0x104
[  589.616000] [<80138738>] generic_handle_irq+0x38/0x4c
[  589.616000] [<800e21ac>] do_IRQ+0x1c/0x2c
[  589.616000] [<80217f60>] plat_irq_dispatch+0xf0/0x158
[  589.616000] [<80006430>] ret_from_irq+0x0/0x4
[  589.616000] [<800397f0>] __do_softirq+0x94/0x280
[  589.616000] [<8017c674>] irq_exit+0x54/0x70
[  589.616000] [<80006430>] ret_from_irq+0x0/0x4
[  589.616000] [<8031d1c0>] verify_iovec+0xb0/0x104
[  589.616000] [<8002b0e0>] ___sys_recvmsg.part.35+0x90/0x1a0
[  589.616000] [<8005dcbc>] __sys_recvmsg+0x58/0x8c
[  589.616000] [<8000843c>] handle_sys+0x11c/0x140
[  589.616000] 
[  589.616000] handlers:
[  589.616000] [<837e3c2c>] ath_isr [ath9k]
[  589.616000] Disabling IRQ #144
Last edited 3 years ago by ambrosa (previous) (diff)

comment:2 Changed 3 years ago by mroek

I'm also seeing some related (I think) messages in the log:

[ 8650.420000] ath: phy0: Failed to stop TX DMA, queues=0x002!
[12703.870000] ath: phy0: Failed to stop TX DMA, queues=0x002!
[12703.880000] ath: phy0: DMA failed to stop in 10 ms AR_CR=0x00000024 AR_DIAG_SW=0x42000020 DMADBG_7=0x000084c0
[12703.890000] ath: phy0: Could not stop RX, we could be confusing the DMA engine when we start RX up
[14505.310000] ath: phy0: DMA failed to stop in 10 ms AR_CR=0x00000024 AR_DIAG_SW=0x42000020 DMADBG_7=0x000042c0
[14505.320000] ath: phy0: Could not stop RX, we could be confusing the DMA engine when we start RX up

My router was running r43427 for a month (in continuous service) without any issues at all, so some change (perhaps the bump to kernel 3.14?) has likely caused this issue.

comment:3 Changed 3 years ago by anonymous

I've also very recently (since last week) noticed spontaneous reboots upon "heavy" WiFi activity on a TP-Link WDR3600 running CC trunk. The OpenWrt/WDR3600 2.4GHz radio0 wlan0 is configured as a station to a consumer AP in a very noisy environment so the link is very unstable.

root@OpenWrt:~# logread |fgrep wlan0|fgrep netifd
Sat Dec  6 04:19:11 2014 daemon.notice netifd: Network device 'wlan0' link is up
Mon Dec  8 00:15:46 2014 daemon.notice netifd: Network device 'wlan0' link is down
Mon Dec  8 00:15:46 2014 daemon.notice netifd: Network device 'wlan0' link is up
Mon Dec  8 00:22:14 2014 daemon.notice netifd: Network device 'wlan0' link is down
Mon Dec  8 00:22:30 2014 daemon.notice netifd: Network device 'wlan0' link is up
Mon Dec  8 00:24:14 2014 daemon.notice netifd: Network device 'wlan0' link is down
Mon Dec  8 00:24:19 2014 daemon.notice netifd: Network device 'wlan0' link is up
Mon Dec  8 00:26:37 2014 daemon.notice netifd: Network device 'wlan0' link is down
Mon Dec  8 00:26:38 2014 daemon.notice netifd: Network device 'wlan0' link is up
Mon Dec  8 00:28:58 2014 daemon.notice netifd: Network device 'wlan0' link is down
Mon Dec  8 00:29:01 2014 daemon.notice netifd: Network device 'wlan0' link is up
Mon Dec  8 00:30:38 2014 daemon.notice netifd: Network device 'wlan0' link is down
Mon Dec  8 00:30:39 2014 daemon.notice netifd: Network device 'wlan0' link is up
Mon Dec  8 00:37:19 2014 daemon.notice netifd: Network device 'wlan0' link is down
Mon Dec  8 00:37:24 2014 daemon.notice netifd: Network device 'wlan0' link is up
Mon Dec  8 01:38:59 2014 daemon.notice netifd: Network device 'wlan0' link is down
Mon Dec  8 01:39:00 2014 daemon.notice netifd: Network device 'wlan0' link is up
root@OpenWrt:~#

However I can't reproduce the reboots, OpenWrt may reboot 3 times within 10 minutes or run stable for hours.

comment:4 Changed 3 years ago by nbd

  • Resolution set to duplicate
  • Status changed from new to closed

Add Comment

Modify Ticket

Action
as closed .
The resolution will be deleted. Next status will be 'reopened'.
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.