Modify

Opened 6 years ago

Closed 5 years ago

Last modified 4 years ago

#10574 closed defect (no_response)

ath5k phy0: gain calibration timeout

Reported by: duvi Owned by: developers
Priority: high Milestone: Barrier Breaker 14.07
Component: base system Version: Trunk
Keywords: Cc:

Description

This keeps happening after a while on trunk v29468.

Board: RB433UAH, Card: TL-WN662AG, AR5414
(There's also an R52N for 5 GHz-N in the system, and it uses ath9k, so it should not matter).

Config:

config wifi-device  radio0
        option disabled 0
        option type     mac80211
        option channel  1
        option macaddr  00:19:e0:67:09:21
        option hwmode   11g
        option txpower  18

config wifi-iface
        option device   radio0
        option network  lan
        option mode     ap
        option wds      1
        option ssid     yyy
        option encryption psk-mixed
        option key      xxx
        option wpa_group_rekey 3600

After some hours of uptime, these messages flood the log, once in ~1.67 seconds.

[25803.450000] ath5k phy0: gain calibration timeout (2412MHz)
[25805.130000] ath5k phy0: gain calibration timeout (2412MHz)
[25806.800000] ath5k phy0: gain calibration timeout (2412MHz)
[25808.460000] ath5k phy0: gain calibration timeout (2412MHz)
[25810.140000] ath5k phy0: gain calibration timeout (2412MHz)
[25811.810000] ath5k phy0: gain calibration timeout (2412MHz)
[25813.470000] ath5k phy0: gain calibration timeout (2412MHz)
[25815.150000] ath5k phy0: gain calibration timeout (2412MHz)
[25816.820000] ath5k phy0: gain calibration timeout (2412MHz)
[25818.480000] ath5k phy0: gain calibration timeout (2412MHz)
[25820.160000] ath5k phy0: gain calibration timeout (2412MHz)
[25821.830000] ath5k phy0: gain calibration timeout (2412MHz)
[25823.490000] ath5k phy0: gain calibration timeout (2412MHz)
[25825.170000] ath5k phy0: gain calibration timeout (2412MHz)
[25826.840000] ath5k phy0: gain calibration timeout (2412MHz)
[25828.500000] ath5k phy0: gain calibration timeout (2412MHz)
[25830.180000] ath5k phy0: gain calibration timeout (2412MHz)
[25831.850000] ath5k phy0: gain calibration timeout (2412MHz)
[25833.510000] ath5k phy0: gain calibration timeout (2412MHz)
[25835.190000] ath5k phy0: gain calibration timeout (2412MHz)
[25836.860000] ath5k phy0: gain calibration timeout (2412MHz)
[25838.520000] ath5k phy0: gain calibration timeout (2412MHz)
[25840.200000] ath5k phy0: gain calibration timeout (2412MHz)

From this time on, clients can no more associate.

I have some other systems running ath5k on trunk v29240 that don't have this error, but they use embedded wlan controller (ar231x, WX-7800A).

Attachments (0)

Change History (21)

comment:1 Changed 6 years ago by mickflemm@…

Hello ;-)

We had a similar report on ath5k and it seems that fast channel switching solves this on AR5414, try passing fastchanswitch=true. Also try using a different channel and see how it goes.

comment:2 follow-up: Changed 6 years ago by anonymous

RB411 A/AH with Mikrotik a/g-Card: same here.
It seems to happen after scanning. Mere unloading/reloading of all wireless modules from ath5k to cfg80211 does not help. Only a reboot solves the error for undefined time.

Suggestion: OpenWRT Builds should include busybox'es full modutils applet for easier debugging.

comment:3 in reply to: ↑ 2 Changed 6 years ago by spanky

Also happening here with a RB411UAHR with the builtin wifi card. The time to failure is indeterminate as far as I can tell.

00:11.0 Ethernet controller: Atheros Communications Inc. AR2417 Wireless Network Adapter [AR5007G 802.11bg] (rev 01)

comment:4 Changed 6 years ago by anonymous

The same happens here, RB411 A/AH with a Mikrotik 52RT mini-PCI-Card on channel 1 in adhoc-mode.
I will try fastswitch and report back.

comment:5 Changed 6 years ago by garyc

Hi, same here. Using an RB411AR with a built in card and an additional R52 card. Issue occurs after an period of time and either card can be affected.

comment:6 in reply to: ↑ description Changed 6 years ago by anonymous

I have been seeing this on Meraki Mini and Accton MR3201A's. Changing channel didn't help, nor did fastchanswitch=1. I am going to try bisecting the changes in compat-wireless-2011-12-01 when I get some free time. Earlier compat-wireless didn't show this problem (2011-11-??) on the same hardware. I am also not seeing it on a WGT634U or Soekris net4521, also using ath5k.

comment:7 Changed 6 years ago by seniorr@…

The Meraki Mini, Accton MR3201A report above was mine (damn you anonymous!)

comment:8 Changed 6 years ago by duvi

r29240 works fine, in r29606 it's already broken. So we should take a look at the changes between.

I'll try to narrow it down if I have time. I'm building r29436 first.

comment:9 Changed 6 years ago by duvi

Here's the deal:

r29435 has been running smooth for more than a day, if there was the gain calibration problem, it would've appeared by now.

r29436 brings the problems, wireless only works for a short period of time, gain calibration timeout appears soon. The latest revision I checked is r29953, which still has this error. I haven't checked any newer build, but there were no ath5k related changes since then.

I tested on a device that's in wds station bridge mode, but I guess it makes no difference.

comment:10 Changed 6 years ago by garyc

Well, I rolled back to /changeset/29436.html and it has been running well for the past 48 hours. I will leave it running to see what happens.

I have one card in AP mode and the other in STA mode.

comment:11 Changed 6 years ago by nbd

I'm pretty sure the issue was introduced in a linux upstream patch series. I need somebody to test the patches individually to figure out which one introduced the issue.

You could do this by first running make package/mac80211/compile QUILT=1, then going into the build dir in build_dir/linux-*/compat-wireless-*/ and reverting the individual patches there, rebuilding mac80211 and updating the driver on the device each time.

I've put up a copy of the full patch series here: http://nbd.name/ath5k-patches.tar.gz you should be able to revert them one by one starting with the last patch in that tarball.

Thanks

comment:12 Changed 6 years ago by seniorr@…

Just for everyone's edification, I have been testing the patch series and found that 0004-ath5k-Calibration-re-work.patch in the patch series nbd posted above appears to be the one that introduced the problem. I am in the process of testing the second of his suggested fixes, relative to r30427. Stay tuned.

comment:13 Changed 6 years ago by duvi

I'm giving a try on the ticket starter RB433UAH using r30388 with 0004-ath5k-Calibration-re-work.patch reverted.

comment:14 Changed 6 years ago by nbd

  • Resolution set to fixed
  • Status changed from new to closed

fixed in r30624

comment:15 Changed 6 years ago by mickflemm@…

Hello people ;-)

Some words about that patch...

a) On reset we don't check if there is a pending gain calibration, we fire it up anyway so even if we make sure we don't call reset when another reset is on the way, our reset lock doesn't take care of any pending calibrations. Actually this is not only happening on ath5k, madwifi, ath and ath9k also don't check if a gain calibration is pending and that's because on a normal reset we also reset the PHY so any pending calibrations etc should get cancelled before we fire up a new one (possibly on a new channel).

b) With fast channel switching we don't reset the phy so we might set a new channel when a gain calibration is running but again this happens on all other drivers so its safe to assume that if a gain calibration or any other calibration runs, hw will not allow us to go on and perform the fast channel switch -it wont give us access to RFBUS- so it'll be ok.

c) On a fast channel switch after it's done we only perform a NF calibration and we wont fire up a gain calibration on the new channel unless a full reset gets fired after that and that can result poor performance.

comment:16 Changed 6 years ago by mickflemm@…

In my tests and some tests done by others there were no problems with that approach, not only that but most "gain calibration timeout" bug reports come from embedded systems and some laptops with weird behaviour, e.g. check these out:
kernel bugzilla: 16436
ubuntu launchpad: 610440
redhad bugzilla: 749909

That makes me think in some of these cases it's not a problem of ath5k but a side effect of clock drifts, platform bugs etc.

But anyway lets focus on this one, first of all this message comes from the reset function, not from calibration (you don't get a calibration failure message + periodic gain calibration happens every 60secs) and the weird thing is that it happens very frequently as you say (you get a flood each 1-2secs) and on the same channel. Have in mind that gain calibration is non-fatal for reset (that means it doesn't return an error) so my guess is we hit a fatal interrupt or a stuck queue.

Can you please enable debug on ath5k and post the output at the time it fails ? Try loading ath5k with debug=0x23.

Thanks a lot for your report and sorry for the delay...

comment:17 Changed 6 years ago by mes@…

  • Resolution fixed deleted
  • Status changed from closed to reopened

I am seeing the same thing with r30646 on a routerstation pro.
I have just reloaded ath5k with debug=0x23.
I will post results when it happens again.

comment:18 Changed 6 years ago by mes@…

OK. It finally miss behaved for me. I have put up the logfile at:

http://exeter.lazo.ca/files/screenlog.0.gz

The problem seems to start at:
Jan 7 20:27:47 sieffertAP user.debug kernel: ath5k phy0: (ath5k_calibrate_work:2348): channel 11/2

by:
Jan 7 20:28:38 sieffertAP user.debug kernel: ath5k phy0: (ath5k_hw_update_noise_floor:1660):

noise floor calibrated: -96

Jan 7 20:28:38 sieffertAP user.err kernel: ath5k phy0: calibration of channel 11 failed

and:
Jan 7 20:28:44 sieffertAP user.err kernel: ath5k phy0: gain calibration timeout (2462MHz)
Jan 7 20:28:49 sieffertAP user.debug kernel: ath5k phy0: (ath5k_beacon_send:1858): stuck beacon, resetting
Jan 7 20:28:49 sieffertAP user.debug kernel: ath5k phy0: (ath5k_reset:2749): resetting

this continues until a reboot. even wifi down/up does not stop the error. So is it the stuck beacon that causes the wifi to stop working?

Thanks, Mark

comment:19 Changed 5 years ago by nbd

please try current trunk

comment:20 Changed 5 years ago by nbd

  • Resolution set to no_response
  • Status changed from reopened to closed

comment:21 Changed 4 years ago by jow

  • Milestone changed from Attitude Adjustment 12.09 to Barrier Breaker 14.07

Milestone Attitude Adjustment 12.09 deleted

Add Comment

Modify Ticket

Action
as closed .
The resolution will be deleted. Next status will be 'reopened'.
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.