Modify

Opened 7 years ago

Closed 7 years ago

Last modified 4 years ago

#9107 closed defect (fixed)

ath9k: Fix PCI error on UBNT Routerstation Pro

Reported by: Michael Gernoth <mike@…> Owned by: nbd
Priority: normal Milestone: Barrier Breaker 14.07
Component: kernel Version: Trunk
Keywords: ath9k Cc:

Description

Activating the wireless interface of an Apple AR5BMB-0072TA mini pci card with Atheros AR5416/AR5008 chipset leads with a probability of more than 50% to a PCI error followed by a kernel Oops:

PCI error 1 at PCI addr 0x100009c8
Data bus error, epc == 87103da0, ra == 87103d8c
Oops[#1]:
Cpu 0
$ 0   : 00000000 80300000 deadc0de 0000000f
$ 4   : 0000000f 00000019 8783fe48 80304290
$ 8   : 87ad9c54 0000fc00 00000000 8792c000
$12   : 00000000 8adfa220 3068c188 9095c095
$16   : 870b4000 00000000 00000001 00000000
$20   : 00000000 870b4004 80304290 80300000
$24   : 00000010 1db0109e
$28   : 8783e000 8783fdb8 802c0000 87103d8c
Hi    : 00000009
Lo    : 00000064
epc   : 87103da0 ath9k_hw_setpower+0xe4/0x5dc [ath9k_hw]
    Not tainted
ra    : 87103d8c ath9k_hw_setpower+0xd0/0x5dc [ath9k_hw]
Status: 1000fc02    KERNEL EXL
Cause : 1080001c
PrId  : 00019374 (MIPS 24Kc)
Modules linked in: ohci_hcd nf_nat_irc nf_conntrack_irc nf_nat_ftp nf_conntrack_ftp ...
Process kworker/u:1 (pid: 7, threadinfo=8783e000, task=87819980, tls=00000000)
Stack : 00000000 00000000 00000000 00000000 00000000 00000000 00000001 870b4000
        00000001 00000000 00000000 870b82dc 80304290 80300000 802c0000 870a324c
        00000000 00000000 00000000 00000000 870b8da0 00000000 870b8200 870a3710
        00000001 87ad08b0 00000001 87ad08b8 00000000 87040d8c 870ba330 8008f188
        87819980 802bf820 00000000 00000000 00000000 87823280 87ad9c00 87823280
        ...
Call Trace:
[<87103da0>] ath9k_hw_setpower+0xe4/0x5dc [ath9k_hw]
[<870a324c>] ath9k_ps_wakeup+0x44/0x8c [ath9k]
[<870a3710>] ath9k_ps_restore+0x47c/0x6b8 [ath9k]
[<8008f188>] queue_delayed_work_on+0xf8/0x118
[<8008df24>] process_one_work+0x264/0x3c4
[<8009005c>] worker_thread+0x23c/0x340
[<80093858>] kthread+0x80/0x88
[<8006cde4>] kernel_thread_helper+0x10/0x18

When the box manages to boot without a crash, running wifi once usually resulted in the above crash.
The Oops can be reproduced with 5 AR5BMB mini pci cards on two Routerstation Pro boards.

While adding debug information to ath9k, I found that a printk at the beginning of ath9k_hw_setpower significantly lowered the probability of the crash.
Replacing the printk with a udelay(1000), the crash in ath9k_hw_setpower is completely gone but the system now seldomly crashes in ath9k_hw_init_global_settings. This is fixed in the same way. It seems some operations prior to calling these functions do not wait long enough or check the status of the card before returning.

I'm now running root@OpenWrt:/# i=1; while true; do wifi; echo "N: ${i}"; i=$((i+1)); done with the attached patch for 167 iterations (and counting) without crashing the box.

The attached patch should be seen as a workaround for the PCI error caused by ath9k on a Routerstation Pro and probably added to OpenWRT, as it adds no harm for users on other platforms.

Attachments (1)

564-ath9k_fix_setpower_init_global.patch (578 bytes) - added by Michael Gernoth <mike@…> 7 years ago.
patch for package/mac80211/patches to fix the PCI error

Download all attachments as: .zip

Change History (12)

Changed 7 years ago by Michael Gernoth <mike@…>

patch for package/mac80211/patches to fix the PCI error

comment:1 Changed 7 years ago by User294

Same goes for RouterStarion with Kamikaze 10.03.1 RC4 with AR5416 radio. I guess you should port this to 10.03.1 RCs? OOPS leads to weird state when device lacks network daemons so you can only take control over device via serial cable, if any.

comment:2 Changed 7 years ago by Michael Gernoth <mike@…>

The attached patch also works unchanged on Backfire. Just copy 564-ath9k_fix_setpower_init_global.patch to package/mac80211/patches and rebuild the image (Don't forget to run "make clean"). I've tested this on Backfire with 3 cards in one board and currently 67 iterations without an Oops.

A small correction to my first report: The udelay is not added to ath9k_hw_setpower but to ath9k_hw_set_power_awake which is called from ath9k_hw_setpower.

comment:3 Changed 7 years ago by anonymous

Nice workaround, so this is probaly the same or related with ticket #8704.

comment:4 Changed 7 years ago by Michael Gernoth <mike@…>

Yes, this is very probably the same problem. I stopped reading that bug report the first time, when the problematic TP-Link card was mentioned...

comment:5 Changed 7 years ago by Michael Gernoth <mike@…>

Just an update as the version of compat-wireless was bumped to master-2011-03-22 today:
It still crashes with a PCI error and my patch still fixes that.

comment:6 Changed 7 years ago by nbd

  • Owner changed from developers to nbd
  • Status changed from new to accepted

Most of the time such crashes indicate that something was being done to the radio while the chip was still in sleep mode. A bug that did this was fixed in r26290.
Please try the latest version and test if it still crashes without your patch.

comment:7 Changed 7 years ago by Michael Gernoth <mike@…>

Revision r26417 still crashes without the patch:

PCI error 1 at PCI addr 0x10007044
Data bus error, epc == 871c34f4, ra == 871c34e0
Oops[#1]:
Cpu 0
$ 0   : 00000000 80300000 deadc0de 0000000f
$ 4   : 0000000f 00000019 865d9e48 8030d290
$ 8   : 87b7d854 0000fc00 00000000 87bf6000
$12   : 55df24e7 d55a56cb 4759422c 61aa6c77
$16   : 870bc000 00000000 00000001 00000000
$20   : 8028d584 870bc01c 87140dd4 00000004
$24   : 00000010 a7938dc2
$28   : 865d8000 865d9dc0 00000001 871c34e0
Hi    : 00000009
Lo    : 00000064
epc   : 871c34f4 ath9k_hw_setpower+0xdc/0x4a0 [ath9k_hw]
    Not tainted
ra    : 871c34e0 ath9k_hw_setpower+0xc8/0x4a0 [ath9k_hw]
Status: 1000fc02    KERNEL EXL
Cause : 1080001c
PrId  : 00019374 (MIPS 24Kc)
Modules linked in: uvcvideo fuse pl2303 ...
Process kworker/u:2 (pid: 2143, threadinfo=865d8000, task=8790e200, tls=00000000)
Stack : 8030c0c4 8007a544 00000000 00000000 00000001 870bc000 00000001 00000000
        8028d584 00000000 87140dd4 00000004 00000001 871a30a8 871a7ac0 00000001
        871b4da0 8007a600 871b4da0 00000000 00000000 871a35c8 80310000 8007a86c
        1000fc03 8790e200 871b4200 87140d90 871b6320 87063000 871b6330 800877cc
        00000000 00000000 00000000 87a30880 87a30880 87a30880 871b42dc 8030d290
        ...
Call Trace:
[<871c34f4>] ath9k_hw_setpower+0xdc/0x4a0 [ath9k_hw]
[<871a30a8>] ath9k_ps_wakeup+0x44/0x8c [ath9k]
[<871a35c8>] ath9k_ps_restore+0x4d8/0x984 [ath9k]


Code: 2403003f  38a50014  0085180b <00621024> 24040001  54440015  8e0300f4  02002021  0dc70c45

comment:8 Changed 7 years ago by nbd

I found a few more instances where the hardware was not woken up properly. Please test http://nbd.name/ath9k_fix_wakeups.patch and see if it makes it work without your patch.

comment:9 Changed 7 years ago by Michael Gernoth <mike@…>

This seems to have fixed it :-)
I'm unable to reproduce the crash when only your patch is applied.

comment:10 Changed 7 years ago by nbd

  • Resolution set to fixed
  • Status changed from accepted to closed

committed in r26418, r26419 - thanks for reporting and testing.

comment:11 Changed 4 years ago by jow

  • Milestone changed from Attitude Adjustment 12.09 to Barrier Breaker 14.07

Milestone Attitude Adjustment 12.09 deleted

Add Comment

Modify Ticket

Action
as closed .
The resolution will be deleted. Next status will be 'reopened'.
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.