Opened 5 years ago
Last modified 20 months ago
#12372 reopened defect
ar71xx/ath9k (WRT160NL): wifi client's connection quality suddenly drops after 18-24h hostapd uptime
| Reported by: | dap@… | Owned by: | nbd |
|---|---|---|---|
| Priority: | normal | Milestone: | Barrier Breaker 14.07 |
| Component: | kernel | Version: | Attitude Adjustment 12.09 Beta |
| Keywords: | ar71xx ath9k wrt160nl loss | Cc: |
Description
After I upgraded from a more than 8 months old trunk release to AA-beta1 on my WRT160NL I found wifi client's connection quality suddenly drops after 18-24h hostapd uptime. The quality drop means 10-20% pkt loss and high RTT. The clients are within 5 meters, the wifi traffic is very light, the noise is low on the floor. My wrt configuration almost default.
I began to monitor some wifi parameters with munin recently, I found the following on the WRT AP when this problem kicks in:
- iw station dump: rx bitrates dropped to 1.0 MBit/s, tx bitrates are vary but does not fall so low
- iw station dump: tx retries are counting 4 times faster than tx packets (!) and tx failed is 10-20% of tx packets
- iw station dump: signal strength of clients dropped *just a little* (about -10%) so this is not the reason itself because sometimes there's much lower signal levels without such issues
- iw survey dump: does not change, sum of times (except 'active') in the used channel is always under 200ms per second
- hostapd does not log anything that worth to mention (even with -dd), only the usual "group key handshake completed" messages
- the clients does not dis- or reconnect
Last time this issue (probably) triggered by me when I arrived home with my smartphone but currently I am not sure that the trigger is always a new client - I keep my eye open from now.
A simple hostapd restart *always* resolves the problem for 16-24 hours. Client reboot/reconnect does not help.
AA-beta2 is under testing right now. My wireless config and a station dump snapshot attached. Can I help you more?
Attachments (4)
Change History (168)
Changed 5 years ago by anonymous
Changed 5 years ago by anonymous
comment:1 Changed 5 years ago by stefan.joosten+openwrt@…
comment:2 Changed 5 years ago by dap@…
I'm done with AA-beta2 testing: failed too. I began to test different trunk revisions. I pick up randomly between head and 29289 - the last one served me nicely for months. I'll update this ticket with my experiences;
r33917 - same issue as AA-beta2
r32510 - very high tx retries&failed constantly, but rx bitrates is not dropped to 1.0 MBit/s. hostapd restart does not help. It is not exactly the same problem, but the retries- and fail / txpkts rate matches (~ 350%/15%).
comment:3 Changed 5 years ago by Rascas
I have the same problem too in 12.09-beta. TP-Link TL-WR1043ND in Client+VAP mode. I can see wireless dropping and staying at 1Mbit (rx bitrates only) in a uptime of more or less 24 hours but sometimes only 2-3 hours is needed to this happen. This is my first time with openWRT, i could configure everything that i needed, this is the only problem. Can post some logs but someone have to say what i must do, because i dont know. Im testing beta2 now.
comment:4 Changed 5 years ago by Roland Pallai <dap@…>
r29294 - seems like works fine (again). There is some "ath: Could not stop RX" messages in dmesg, but no problem with wifi connections after 23 hours, yet.
Guys, try r29294 if you can. Probably you're hitting the same bug as here if that release fixes your problems too.
(My next try is r30403 with kernel 3.2)
comment:5 Changed 5 years ago by Rascas
The same happens in beta2.
comment:6 Changed 5 years ago by Roland Pallai <dap@…>
comment:7 Changed 5 years ago by riproute@…
Any progress on this? I am seeing the same issue but only with certain environments. I am testing the same build on 8 identical access points and only see the failure in one of those environments. I've changed out the hardware in that environment and still see the issue.
comment:8 Changed 5 years ago by Roland Pallai <dap@…>
I'm still in testing. Unfortunately 24 hours is not enough for a test run - once it came after 43 hours.
Now I suspect the evil patch is between r32420 and r32510. I have to make sure then have to revert patches one by one and run tests over and over. It takes weeks to see results but I have no other chance. I'll report on progress.
comment:9 Changed 5 years ago by stefan.joosten+openwrt@…
Could very well be Roland.
The issue seems to have been resolved for me.
I'm a bit unsure what revision I used during the time I experienced the issue.
Rebuild using more recent code and I've been running Attitude Adjustment branch r33969 for 7 days without any problems now. I'm going to refresh my sources now and compile a new build using it.
So far so good :)
comment:10 Changed 5 years ago by Roland Pallai <dap@…>
comment:11 follow-up: ↓ 17 Changed 5 years ago by Roland Pallai <dap@…>
r34123 (latest trunk) failed after 2d 22h, the symptoms are same.
comment:12 Changed 5 years ago by Roland Pallai <dap@…>
I made a spreadsheet for the test results:
https://docs.google.com/spreadsheet/ccc?key=0AikqNU3ONMJHdGpTR21TQXNrZm45UldsZTZkX0JLTnc
comment:13 Changed 5 years ago by vadim@…
comment:14 Changed 5 years ago by elvstone@…
I can confirm problems on my WRT160NL with r34325 that I built today. I'm guessing they might be the same you're seeing, but possibly there's some crash involved as well. Today after starting the AP up I tried a 2 GB torrent download and it went fine, but then after a little while the connection got flaky and then the clients got disconnected. Had to reboot.
It seems ath9k is really finicky :(
I'm new to OpenWRT, but if I get instructions on how to make useful information for the bug report I'll see what I can do.
comment:15 Changed 5 years ago by elvstone@…
I should note that I don't have to wait 1-2 days to get problems. They usually show up after minutes/hours and quite randomly. They don't seem to be related to the amount of traffic. Like I said, a 2 GB torrent was downloaded successfully in a few minutes, and then during the slow traffic period that followed the connection got flaky again and finally all clients were disconnected.
comment:16 Changed 5 years ago by elvstone@…
Is anyone currently running a WRT160NL on a revision that works without problems? If so, I'm willing to try a bisect to find the exact offending commit. But perhaps there are no (completely) good revisions?
comment:17 in reply to: ↑ 11 Changed 5 years ago by anonymous
Replying to Roland Pallai <dap@…>:
r34123 (latest trunk) failed after 2d 22h, the symptoms are same.
Yup, a more recent AA build failed for me as well on TL-WR1043ND.
Although I just get very slow WiFi. All RX rates on the router drop to 1 mbit, which kills the experience. Restarting the wireless (wifi down; wifi up) fixes it and everything works like it should.
I noticed the problems start occurring after some decent to heavy load with for example a torrent client making lots of connections. I do everything over Ethernet, but my roommates are stubborn and everything is WiFi for them...
comment:18 Changed 5 years ago by anonymous
was me: Stefan <stefan.joosten+openwrt@…>
comment:19 follow-up: ↓ 65 Changed 5 years ago by Roland Pallai <dap@…>
Good news, I found the root of my problem! My WRT160NL uptime is 6 days without issues. ;)
The buggy patch is: https://dev.openwrt.org/browser/trunk/package/mac80211/patches/562-ath9k_reduce_ani_interval.patch?rev=32510
I reverted the patch on r34287 and the problem has gone.
There is an easy workaround for this:
echo 1 >/sys/kernel/debug/ieee80211/phy0/ath9k/disable_ani
Although I didn't try it yet, it should fix this issue too. Try it and report - I will try this workaround too.
comment:20 Changed 5 years ago by elvstone@…
That's great. I have enabled the workaround and will report if I see any problems in the coming days. The connection has actually been surprisingly stable in the last few days, though my girlfriend who was home yesterday said that it was slow and she got disconnected once.
comment:21 Changed 5 years ago by stefan.joosten+openwrt@…
I resorted to restarting wireless during the night, which does help a bit.
Will try the disable ANI workaround as well.
Funny enough I had found out about the same workaround last week, just hadn't gotten around to trying it yet.
But I do hope some OpenWRT devs check this out and find some middle ground in ANI that works. Because disabling ANI can be detrimental for performance can it not?
comment:22 Changed 5 years ago by elvstone@…
I'd just like to mention that the disable ANI workaround seems to have worked for me. No problems in the last 5 days. I don't know what impacts on performance this has, since I've never had a flawless connection until now. At the moment I think I'm getting ~5-6 MB/s, which I think is okay if not great. Would also like the problem to be fixed at its root.
comment:23 Changed 5 years ago by nbd
- Owner changed from developers to nbd
- Status changed from new to accepted
comment:24 Changed 5 years ago by stefan.joosten+openwrt@…
Boy, that was quick. :-)
Testing the disable_ani on tl-wr1043nd. will report back in a couple of days. If there is anything else I can try or should provide, let know.
comment:25 Changed 5 years ago by Roland Pallai <dap@…>
disable_ani workaround also worked for me too; 5d9h+ uptime without issues on (unpatched) r34287.
I don't know the performance impact of the reverted patch nor disable_ani, I'm not really interested in wifi performance now.
comment:26 Changed 5 years ago by Francesco Lotti <francesco@…>
disable_ani didn't work here. I'm using AR5416 on Asus WL500GP in Access Point (WDS) mode.
After a few hours clients get disconnected and the AP is not willing to accepting any new connection. Strange thing is that the client connected to the AP in WDS mode continues to work without problems.
comment:27 Changed 5 years ago by nbd
what version of openwrt is that?
comment:28 Changed 5 years ago by Francesco Lotti <francesco@…>
AA beta2 .
comment:29 Changed 5 years ago by nbd
ok, then update to rc1, it might fix your issue
comment:30 Changed 5 years ago by vadim@…
So it fixed in rc1, or just workaround will work?
comment:31 Changed 5 years ago by Francesco Lotti <francesco@…>
Ok thanks. I'll upgrade asap:-)
comment:32 Changed 5 years ago by Francesco Lotti <francesco@…>
Ok, meanwhile I swapped my AR5416 (ath9k) minipci with a AR5413/AR5414 (ath5k) one.
Everything seemed to work properly until today when clients suddenly disconnected and new clients weren't able to connect anymore.
So ath5k seems to have same problems of ath9k, at least on AA beta2.
# iw wlan0 station dump
Station 00:19:d2:XX:XX:XX (on wlan0)
inactive time: 388 ms
rx bytes: 153449
rx packets: 2475
tx bytes: 20941
tx packets: 161
tx retries: 0
tx failed: 0
signal: -58 dBm
signal avg: -58 dBm
tx bitrate: 1.0 MBit/s
rx bitrate: 54.0 MBit/s
authorized: yes
authenticated: yes
preamble: long
WMM/WME: yes
MFP: no
TDLS peer: no
comment:33 Changed 5 years ago by nbd
I thought you wanted to update... I already told you the problem of no clients being able to connect anymore is fixed in rc1. It's not drivers specific by the way, it's a mac80211 issue.
comment:34 Changed 5 years ago by anonymous
Hello
Has anyone else noticed a problem functioning router wr1043nd v1.8 for pre-OpenWRT Backfire 10.03.X?
Previously I was using ddwrt, later I upgraded back to the original firmware. Until then, everything is ok, when I upgraded OpenWRT Backfire 10.03.1 I had problems with inactivity wan port. After switching to ddwrt I found that even there wan port does not work. (work only on the original firmware). After the upgrade attitude adjustment 12.9-rc1, the wan port into operation when I tried to install the original firmware but I brickit. After a short time I'm using a RS232 port successfully debrick router. I found that the router does not work the same as before in the ddwrt still not working WAN port, problems with Wifi as you have mentioned are present. What I wonder is functioning LEDs startup (boot) is not the same as compared to the factory settings (original firmware).
Does anyone know how to restore a record in the rom chip??
comment:35 Changed 5 years ago by Francesco Lotti <francesco@…>
nbd, It happened that I didn't have the time to upgrade so I tried to replaced the card.
Now I just flashed AA rc1 and everything seems fine again.
comment:36 Changed 5 years ago by karsten.bier@…
I am seeing diconnects too, but the connection returns pretty quickly after a few seconds.
I'm on AA-rc1 too, on a TP-Link TL-WR1043N/ND v1.
I tried the disable_ani workaround, but it didn't work.
For checking if the connection is stable i start a ftp-transfer. The connection speed is very good in HT40, while it lasts. I'm getting up to 10 mb/s in HT40 and something around 6 mb/s in HT20.
For now i can only use 802.11g. There was a bug in kamikaze before which lead to drops in connection speeds, but that seems fixed now.
here's a snippet from the kernel log when the dosconnects happen in 802.11n:
Dec 23 13:06:16 OpenWrt daemon.info hostapd: wlan0: STA xx:xx:xx:xx:xx:xx IEEE 802.11: disconnected due to excessive missing ACKs
Dec 23 13:06:46 OpenWrt daemon.info hostapd: wlan0: STA xx:xx:xx:xx:xx:xx IEEE 802.11: deauthenticated due to inactivity (timer DEAUTH/REMOVE)
Dec 23 13:06:47 OpenWrt daemon.info hostapd: wlan0: STA xx:xx:xx:xx:xx:xx IEEE 802.11: authenticated
Dec 23 13:06:47 OpenWrt daemon.info hostapd: wlan0: STA xx:xx:xx:xx:xx:xx IEEE 802.11: associated (aid 1)
Dec 23 13:06:47 OpenWrt daemon.info hostapd: wlan0: STA xx:xx:xx:xx:xx:xx WPA: pairwise key handshake completed (RSN)
as you can see, it only takes about 30 seconds and everything is fine again.
i would love to see this fixed in the release. if needed i can provide more info or test a newer trunk version.
comment:37 Changed 5 years ago by karsten.bier@…
I tried again with two clients connected, but only he one with a high network load (doing the ftp transfer) got disconnected.
Can anybody confirm that behaviour ?
comment:38 Changed 5 years ago by anonymous
using AA rc1, same behaviour (wr1043nd) as karsten:
with light load I get nearly no problems (uptime of a week or so), but using a client with high network load causes disconnect (sometimes tens of minutes, sometimes it takes couple hours).
comment:39 Changed 5 years ago by anonymous
Just wanted to report my findings. It seems the disconnect issue is cleared up for me on RC1. However I'm still getting the slowdown issues mentioned earlier. Most of the RX rates drop to 1Mbit. Doing the wifi up/down command clears the problem.
TL-WR1043N
AA-RC1
comment:40 Changed 5 years ago by stefan.joosten+openwrt@…
That is the same behaviour my router shows. It slows down to 1 mbit, making it extremely slow and unusable from a user perspective. Restarting wifi solves this problem until it occurs again.
Using the disable_ani workaround keeps WiFi working here, but I get a whole lot more "Could not stop TX" errors. WiFi continues to work though.
I've noticed there are some new backported fixes to ath9k, but I'm unsure if those possibly address this problem. I will probably compile from AA branch later this month and test some more.
comment:41 Changed 5 years ago by stefan.joosten+openwrt@…
Just to add some info: that's a TL-WR1043ND using AA 12.09-rc1 (r34457)
comment:42 Changed 5 years ago by spamsales@…
I use the latest trunk and have the same behaviour like Karsten. Moreover i have a WRT841N V8.1 with ath9k.
comment:43 Changed 5 years ago by stefan.joosten+openwrt@…
Can confirm this still happens on AA-rc1 built 3 days ago on TL-WR1043ND without disable_ani workaround.
Seemed to work fine, as it did yesterday and the day before. But today, again 1Mbit RX rates, but really no data going through, so wireless is useless.
I have no warnings or errors in kernel log, other than:
hrtimer: interrupt took 31346 ns
Restarting my wireless now to solve it.
The disable_ani workaround is still necessary it seems, at least on my end.
Any news or progress on this yet?
Is there anything I can do as a user to help?
comment:44 Changed 5 years ago by anonymous
I think this is the same problem but I rebooted the router so I can't compare.
I have a TL-WR1043ND with AA 12.09-rc1.
I was running a download and torrents, then speeds went to zero.
I disconnected and reconnected wireless connection.
It connected but everything was slow.
Ping to router was 49-222ms, average ~100ms, 33% (4/12) packet loss.
Memory usage was normal, load average 0.03.
Rebooted router, everything is fine.
comment:45 Changed 5 years ago by matti.laakso@…
I started to encounter this problem after getting a new wireless client (Acer Iconia Tab W510). Running 12.09-rc1 on a Buffalo WZR-HP-G300NH. Restarting wifi nightly seems to be enough to get mostly error free operation.
comment:46 Changed 5 years ago by nbd
please try r35786 or newer
comment:47 Changed 5 years ago by stefan.joosten+openwrt@…
After 9 days of running this newer version I can say this fix helps. It does not seems to fix the true cause of the issue, but it helps greatly with the experience from a user's point of view.
I have experienced WiFi drop to the 1 mbit speed after this fix, albeit less frequently. And I bet that is due to the driver cold resetting the chip when most of the problems occur. This way it "fixes" the problem by restarting itself in most cases.
So I would like to thank you for this fix, as it does help to workaround the issue here most of the time. I hope more tweaks and fixes keep coming, and I will continue testing of course.
comment:48 Changed 5 years ago by nbd
another fix committed, please try r35974 or newer
comment:49 Changed 5 years ago by stefan.joosten+openwrt@…
I'm sorry to report this issue still occurs on r35974 and the r36052 I'm currently running.
r35786 seems to have been the best one for me. But that's hardly a scientific conclusion, because that one just happened to last the longest without me having to reset the WiFi.
I'm considering reverting back to it, to see if it can pull it off a second time. I will keep you posted in case I do return to it and it happens to be more stable. I expect it won't and I just got lucky with it.
So while the fixes do address some of it, because the WiFi is more usable than it was before, it still really slows down. Instead of several MB/s, my download speed falls to between 200KB/s and 400KB/s.
comment:50 Changed 5 years ago by stefan.joosten+openwrt@…
AA r36052 is throwing a fit again today. WiFi slowed to a crawl and it's not fixing itself, so I will have to restart it manually after having done that just yesterday.
I'm reverting back to AA r35786 because that was better in my experience. I will inform you if that happens to be better than the more current revisions.
comment:51 Changed 5 years ago by matti.laakso@…
I'm running AA r36099 now, and I noticed something strange: When this problem occurs download speeds from internet with wireless drop to around 600 kB/s, however, I can still download with samba from a USB hard drive connected to the router at 3.5 MB/s, which is pretty much the maximum I can get from the 65 Mbps wifi link! Also, minstrel_ht statistics from rc_stats always show a throughput of ~35 Mbps. Wired connection to internet is stable at 50 Mbps which is what the ISP gives me. How is this possible?
comment:52 Changed 5 years ago by anonymous
r36088 looks ok with a 150 mbps and 40hz in 802.11n, transfer rates are a bit unstable but basically i get something like 9mb/s on ftp transfers.
i just installed the 12.09 release and was eager to check :)
now it's a waiting game to see how well it works in the long run, but things are definitely looking good.
i guess a huge thank you is in order !
comment:53 Changed 5 years ago by elvstone@…
I just upgraded to 12.09 release and it took only an hour or so until I got disconnected and had to restart the AP (a WRT160NL) :(
comment:54 Changed 5 years ago by nbd
Please try current AA SVN - the commit r36664 should hopefully have fixed this.
comment:55 Changed 5 years ago by anonymous
nbd:
Tested AA r36716, 2 days works fine, but on 3rd I got problem again. Still need to remove 550-ath9k_reduce_ani_interval.patch for normal use.
comment:56 Changed 5 years ago by nbd
Please try changing ATH9K_ANI_POLLINTERVAL in that patch to 200 and see if that makes things more stable for you.
comment:57 Changed 5 years ago by igor
nbd:
With '200' working good so far. I was planning on testing for 2-4 weeks, but saw r36823. Do I need to continue testing or begin re-test with '300' ?
comment:58 Changed 5 years ago by nbd
If 200 worked and 1000 worked, then 300 is going to work as well. Thanks for testing.
comment:59 Changed 5 years ago by igor
nbd:
Got problem today after 1 week uptime. Will start testing with '300'.
comment:60 Changed 5 years ago by nbd
Please change .config to set CONFIG_BUSYBOX_CONFIG_FEATURE_IPC_SYSLOG_BUFFER_SIZE to 512,
and make sure CONFIG_PACKAGE_ATH_DEBUG is enabled.
After you've brought up wifi, run this:
echo 0x49 > /sys/kernel/debug/ieee80211/phy0/ath9k/debug
As soon as the problem appears, run
logread | gzip -c > /tmp/log.gz
And send me (or attach) the contents of /tmp/log.gz
Thanks
comment:61 Changed 5 years ago by anonymous
I'm still seeing the Excessive ack issue with ATH0K_ANI_POLLINTERVAL set to 300 (using snapshot r36859). I also posted something to thread on forum (https://forum.openwrt.org/viewtopic.php?pid=204139#p204139)
comment:62 Changed 5 years ago by anonymous
Been doing some testing looks like the 'disassoc_low_ack' setting (in /var/run/hostapd-phy0.conf) impacts the excessive missing ACK message.
(forum post https://forum.openwrt.org/viewtopic.php?pid=204236#p204236)
comment:63 Changed 5 years ago by igor
nbd:
1 month passed after I started test with ani_pollinterval '300' and enabled debug. No problem at all.
comment:64 Changed 5 years ago by nbd
- Resolution set to fixed
- Status changed from accepted to closed
Good to know, thanks for testing.
comment:65 in reply to: ↑ 19 Changed 5 years ago by anonymous
Replying to Roland Pallai <dap@…>:
There is an easy workaround for this:
echo 1 >/sys/kernel/debug/ieee80211/phy0/ath9k/disable_ani
Although I didn't try it yet, it should fix this issue too. Try it and report - I will try this workaround too.
nope, does not fix the wlan clients losing connections after 1-2days.
comment:66 Changed 5 years ago by nbd
please test with the fixes in r37616
comment:67 Changed 5 years ago by anonymous
I had problem with wifi too (it usually disappeared and i had to reboot the router or use the "wifi" command to get it back), and now wifi is working, but the connection slows down after a while (especially when downloading big files from the internet), but one thing is fixed: now wifi doesn't disappears! Will be an other patch for this? My current version is:r37673.
comment:68 Changed 5 years ago by nbd
there were some more changes after that, please test latest.
comment:69 Changed 5 years ago by anonymous
Ok. One more question: where can i see the changelist? Is there a link for that? I'll report as soon as possible.
comment:70 Changed 5 years ago by nbd
comment:71 Changed 4 years ago by dap@…
2 days ago I've upgraded to r37948 from my old r34287 and now the problem is back. The symptomps are same as in my original report. A hostapd restart resolved the issue, again. r34287 with the disable_ani workaround was stable for months.
Now I'm running r37948 with 'echo 0 >/sys/kernel/debug/ieee80211/phy0/ath9k/ani' - I'll report on the next week..
comment:72 follow-up: ↓ 73 Changed 4 years ago by vadim@…
- Resolution fixed deleted
- Status changed from closed to reopened
Also have speed slow down, after some time.
comment:73 in reply to: ↑ 72 Changed 4 years ago by nbd
comment:74 Changed 4 years ago by awilchak@…
r38249 is definitely better. Instead of dropping to zero and staying there, it seems like speeds occasionally drop to 500KB/s and then go back up. Need to test further but I think that last commit is making a big difference. Thank you!
comment:75 Changed 4 years ago by nbd
please also try r38257, it can prevent spurious reconnects
comment:76 Changed 4 years ago by dap@…
comment:77 Changed 4 years ago by rw_trac
r38294 - the problem with disconnects still present:
Oct 4 09:26:22 OpenWrt kernel: [ 837.060000] ath: phy0: Failed to stop TX DMA, queues=0x004!
Oct 4 09:26:22 OpenWrt kernel: [ 837.080000] ath: phy0: DMA failed to stop in 10 ms AR_CR=0x00000024 AR_DIAG_SW=0x42000020 DMADBG_7=0x000084c0
Oct 4 09:26:22 OpenWrt kernel: [ 837.090000] ath: phy0: Could not stop RX, we could be confusing the DMA engine when we start RX up
Oct 4 09:26:23 OpenWrt kernel: [ 837.320000] ath: phy0: Failed to stop TX DMA, queues=0x004!
Oct 4 09:26:23 OpenWrt kernel: [ 837.570000] ath: phy0: Failed to stop TX DMA, queues=0x004!
Oct 4 09:26:23 OpenWrt kernel: [ 837.810000] ath: phy0: Failed to stop TX DMA, queues=0x004!
Oct 4 09:26:23 OpenWrt kernel: [ 838.050000] ath: phy0: Failed to stop TX DMA, queues=0x004!
Oct 4 09:26:24 OpenWrt kernel: [ 838.300000] ath: phy0: Failed to stop TX DMA, queues=0x004!
Oct 4 09:26:24 OpenWrt kernel: [ 838.540000] ath: phy0: Failed to stop TX DMA, queues=0x004!
Oct 4 09:26:24 OpenWrt kernel: [ 839.020000] ath: phy0: Failed to stop TX DMA, queues=0x004!
Oct 4 09:26:27 OpenWrt kernel: [ 841.350000] ath: phy0: Failed to stop TX DMA, queues=0x004!
Oct 4 09:26:29 OpenWrt kernel: [ 843.450000] ath: phy0: Failed to stop TX DMA, queues=0x004!
Oct 4 09:26:31 OpenWrt kernel: [ 846.010000] ath: phy0: Failed to stop TX DMA, queues=0x004!
Oct 4 09:26:32 OpenWrt kernel: [ 846.250000] ath: phy0: Failed to stop TX DMA, queues=0x004!
Oct 4 09:26:32 OpenWrt kernel: [ 846.500000] ath: phy0: Failed to stop TX DMA, queues=0x004!
Oct 4 09:26:32 OpenWrt hostapd: wlan0: STA 00:1e:4c:47:f5:08 IEEE 802.11: disconnected due to excessive missing ACKs
comment:78 Changed 4 years ago by anonymous
Same problem here. Drops to 1 MB RX and TX (sometimes only one).
Some clients disconnects completely.
comment:79 follow-up: ↓ 80 Changed 4 years ago by dap@…
comment:80 in reply to: ↑ 79 Changed 4 years ago by nbd
- Resolution set to fixed
- Status changed from reopened to closed
Thanks for testing
comment:81 Changed 4 years ago by Steffen
Does this fix the TL-WR1043ND v1.8 problem "Suspect of hardware bug that bring down WiFi after a while"? (http://wiki.openwrt.org/toh/tp-link/tl-wr1043nd)
comment:82 Changed 4 years ago by kisssandoradam@…
- Resolution fixed deleted
- Status changed from closed to reopened
Today i have flashed my router with the latest trunk version and when i've been downloading with high speed, the wifi stopped working. I think it is not fixed. I connected my computer to the router with cable, but i didn't see any intersting in the kernel and system log.
comment:83 Changed 4 years ago by nbd
what kind of router, what revision, what configuration?
comment:84 Changed 4 years ago by kisssandoradam@…
1043nd router, version number: 1.8
OpenWrt configuration:
Wlan:
WPA2-PSK
Channel 11, second channel below (Force 40MHz mode)
Country code: Hungary
Transmit Power: 20dbm (100mW)
comment:85 Changed 4 years ago by kisssandoradam@…
The wifi stopped working again. Twice in 24 hours. I think the 2013. september 29. trunk version was a bit more stable than the current one. The connection drops when i download with about 5MB/s continuously. But sometimes it works for weeks with the same speed, sometimes not.
comment:86 Changed 4 years ago by dap@…
Seems like I'm hitting kisssandoradam's issue on my WRT160NL with r38259 right now. On massive download, traffic of the wifi client stops after a while. The client remains connected, no error messages but the connection stalling. All other connected clients are working fine, without interrupt. Reconnecting the client does solve the problem. I can reproduce it in 2-3 minutes.
Now I tried with disabled ANI and no problem for 15 minutes now. It's not enough to say that disabled ANI is a workaround, but worth to try. Kisssandoradam, please try:
echo 0 >/sys/kernel/debug/ieee80211/phy0/ath9k/ani
and report back!
comment:87 Changed 4 years ago by Adam <kisssandoradam@…>
My problem was a bit more complicated than yours dap. When i lost connection, then every client loses the wifi connection, because something stops working in the router. Today i have reverted back to backfire 10.03.1 and i think it's more stable than the newer builds. Maybe the problem is in the linux kernel and not in the openwrt. If this stops working too i will use again the latest trunk, but i hope i don't have too.
comment:88 Changed 4 years ago by dap@…
I agree Adam, it's an another problem.
Although I'm downloading for 40 minutes now with disabled ANI and no problem. I suspect my download issue is still an ANI issue - ticket status "reopened" is valid.. I'll do some tests tomorrow..
comment:89 Changed 4 years ago by anonymous
Yes, i can confirm it's a new issue, I have been running an old r36715 build and wifi is fine for most of the time (i restart wifi daily, router weekly) but when I built r38347, i started having frequent wifi issues after a few hours. Speed would drop down to around 5 Mbit/s, some connections would just timeout. In a nutshell, wifi is basically useless, had to revert back to old firmware.
comment:90 Changed 4 years ago by nbd
please try the latest version.
comment:91 Changed 4 years ago by anonymous
i also have this problem. Connection quality drops after 12 to 24 hours of wifi uptime. It seems to depend on the wifi load.
I tried openwrt 12.09 and the latest BB r38999. The symptons are identical.
Router model Buffalo WZR-HP-G300NH.
cat /sys/kernel/debug/ieee80211/phy0/ath9k/ani
ANI: ENABLED
ANI RESET: 221
SPUR UP: 62341
SPUR DOWN: 62341
OFDM WS-DET ON: 0
OFDM WS-DET OFF: 0
MRC-CCK ON: 0
MRC-CCK OFF: 0
FIR-STEP UP: 59683
FIR-STEP DOWN: 59821
INV LISTENTIME: 0
OFDM ERRORS: 299990116
CCK ERRORS: 18037358
cat /sys/kernel/debug/ieee80211/phy0/ath9k/reset
Baseband Hang: 2
Baseband Watchdog: 0
Fatal HW Error: 0
TX HW error: 0
TX Path Hang: 0
PLL RX Hang: 0
MCI Reset: 0
iw wlan0 station dump
Station xx:22 (on wlan0) (macbook)
inactive time: 150 ms
rx bytes: 1351429
rx packets: 7394
tx bytes: 10212479
tx packets: 7963
tx retries: 2313
tx failed: 12
signal: -51 [-60, -52, -58] dBm
signal avg: -51 [-59, -53, -59] dBm
tx bitrate: 117.0 MBit/s MCS 14
rx bitrate: 5.5 MBit/s
authorized: yes
authenticated: yes
preamble: short
WMM/WME: yes
MFP: no
TDLS peer: no
Station xx:eb (on wlan0) (xbox 360)
inactive time: 10440 ms
rx bytes: 22653983
rx packets: 240071
tx bytes: 643697633
tx packets: 464528
tx retries: 149820
tx failed: 139
signal: -39 [-42, -49, -42] dBm
signal avg: -41 [-46, -51, -43] dBm
tx bitrate: 78.0 MBit/s MCS 12
rx bitrate: 104.0 MBit/s MCS 13
authorized: yes
authenticated: yes
preamble: short
WMM/WME: yes
MFP: no
TDLS peer: no
comment:92 Changed 4 years ago by nbd
You forgot to post your wireless config.
comment:93 Changed 4 years ago by anonymous
sorry.
cat /etc/config/wireless
config wifi-device 'radio0'
option type 'mac80211'
option macaddr '00:xx:xx:xx:xx:8e'
option hwmode '11ng'
list ht_capab 'SHORT-GI-40'
list ht_capab 'DSSS_CCK-40'
option distance '20'
option country 'DE'
option htmode 'HT20'
option channel '11'
option txpower '9'
config wifi-iface
option device 'radio0'
option network 'lan'
option mode 'ap'
option ssid 'wlan-1234'
option key 'xyz'
option encryption 'psk2+ccmp'
option macfilter 'allow'
list maclist 'xx:xx:xx:xx:xx:xx'
list maclist 'xx:xx:xx:xx:xx:xx'
comment:94 Changed 4 years ago by anonymous
I tried latest r39096, wifi slowed down again after 12h.
#cat /sys/kernel/debug/ieee80211/phy0/ath9k/ani
ANI: ENABLED
ANI RESET: 11
SPUR UP: 11796
SPUR DOWN: 11796
OFDM WS-DET ON: 0
OFDM WS-DET OFF: 0
MRC-CCK ON: 0
MRC-CCK OFF: 0
FIR-STEP UP: 6916
FIR-STEP DOWN: 6919
INV LISTENTIME: 0
OFDM ERRORS: 30786720
CCK ERRORS: 1669543
# cat /sys/kernel/debug/ieee80211/phy0/ath9k/reset
Baseband Hang: 1
Baseband Watchdog: 0
Fatal HW Error: 0
TX HW error: 0
TX Path Hang: 0
PLL RX Hang: 0
MCI Reset: 0
# iw wlan0 station dump
Station xx:xx:xx:xx:xx:22 (on wlan0)
inactive time: 1890 ms
rx bytes: 166795
rx packets: 1064
tx bytes: 786085
tx packets: 951
tx retries: 1619
tx failed: 19
signal: -47 [-56, -48, -54] dBm
signal avg: -48 [-56, -49, -54] dBm
tx bitrate: 117.0 MBit/s MCS 14
rx bitrate: 1.0 MBit/s
authorized: yes
authenticated: yes
preamble: short
WMM/WME: yes
MFP: no
TDLS peer: no
Station xx:xx:xx:xx:xx:10 (on wlan0)
inactive time: 20 ms
rx bytes: 573743
rx packets: 4682
tx bytes: 12198774
tx packets: 8598
tx retries: 2812
tx failed: 22
signal: -52 [-54, -56, -67] dBm
signal avg: -53 [-54, -60, -63] dBm
tx bitrate: 52.0 MBit/s MCS 5
rx bitrate: 19.5 MBit/s MCS 2
authorized: yes
authenticated: yes
preamble: short
WMM/WME: yes
MFP: no
TDLS peer: no
#uptime
10:54:56 up 12:35, load average: 0.01, 0.02, 0.04
comment:95 Changed 4 years ago by dap@…
Hi,
Now I have tried r39124 with enabled ANI and the wifi has stopped working after 10 minutes of massive download. All clients were disconnected, SSID disappeared.
Latest log messages:
Tue Dec 17 23:23:54 2013 daemon.info hostapd: wlan0: STA a0:f3:c1:f8:9b:e0 IEEE 802.11: disconnected due to excessive missing ACKs
Tue Dec 17 23:24:02 2013 daemon.info hostapd: wlan0: STA 7c:d1:c3:6d:16:e6 IEEE 802.11: disconnected due to excessive missing ACKs
Tue Dec 17 23:24:24 2013 daemon.info hostapd: wlan0: STA a0:f3:c1:f8:9b:e0 IEEE 802.11: deauthenticated due to inactivity (timer DEAUTH/REMOVE)
Tue Dec 17 23:24:32 2013 daemon.info hostapd: wlan0: STA 7c:d1:c3:6d:16:e6 IEEE 802.11: deauthenticated due to inactivity (timer DEAUTH/REMOVE)
The 'wifi' command fixed it.
Now I'm running r39124 with disabled ANI and I'll report back if something goes wrong.
comment:96 Changed 4 years ago by anonymous
Hi,
r39139 seems to be an improvement on Buffalo WZR-HP-G300NH. No Baseband hangs for the last 24h.
comment:97 Changed 4 years ago by fa11enangel
I've tested it with r39096 on a TP-Link WR1043nd v1.11. After about 7-8 days the DMA errors occurred again, but the router was not used for 3 days during holidays.
[ 20.380000] br-lan: port 2(wlan0) entered forwarding state [240600.930000] ath: phy0: Failed to stop TX DMA, queues=0x004! [240619.490000] ath: phy0: Failed to stop TX DMA, queues=0x004! [240620.100000] ath: phy0: Failed to stop TX DMA, queues=0x004! [240621.100000] ath: phy0: Failed to stop TX DMA, queues=0x004! [240622.170000] ath: phy0: Failed to stop TX DMA, queues=0x004! [240622.990000] ath: phy0: Failed to stop TX DMA, queues=0x001! [268452.990000] ath: phy0: Failed to stop TX DMA, queues=0x004! [268485.950000] ath: phy0: Failed to stop TX DMA, queues=0x004! [268486.560000] ath: phy0: Failed to stop TX DMA, queues=0x004! [268487.460000] ath: phy0: Failed to stop TX DMA, queues=0x004! [268488.740000] ath: phy0: Failed to stop TX DMA, queues=0x004! [496566.560000] ath: phy0: Failed to stop TX DMA, queues=0x004! ... # run command "wifi" ... [743186.430000] ath: phy0: Failed to stop TX DMA, queues=0x100! [743186.830000] ath: phy0: Failed to stop TX DMA, queues=0x100! [743187.340000] ath: phy0: Failed to stop TX DMA, queues=0x100! [743245.040000] device wlan0 left promiscuous mode [743245.040000] br-lan: port 2(wlan0) entered disabled state [743245.560000] IPv6: ADDRCONF(NETDEV_UP): wlan0: link is not ready [743245.560000] device wlan0 entered promiscuous mode [743245.570000] br-lan: port 2(wlan0) entered forwarding state [743245.570000] br-lan: port 2(wlan0) entered forwarding state [743245.940000] IPv6: ADDRCONF(NETDEV_CHANGE): wlan0: link becomes ready [743247.570000] br-lan: port 2(wlan0) entered forwarding state
I've upgraded to r39163 from snapshots. I'll tell how it is working.
comment:98 Changed 4 years ago by anonymous
Buffalo WZR-HP-G300NH with r39155 slow down after 4 days uptime.
20:03:28 up 4 days, 18:40, load average: 0.04, 0.02, 0.04
cat /sys/kernel/debug/ieee80211/phy0/ath9k/reset
Baseband Hang: 2
Baseband Watchdog: 0
Fatal HW Error: 0
TX HW error: 0
TX Path Hang: 0
PLL RX Hang: 0
MCI Reset: 0
cat /sys/kernel/debug/ieee80211/phy0/ath9k/ani
ANI: ENABLED
ANI RESET: 82
SPUR UP: 75060
SPUR DOWN: 75060
OFDM WS-DET ON: 0
OFDM WS-DET OFF: 0
MRC-CCK ON: 0
MRC-CCK OFF: 0
FIR-STEP UP: 63234
FIR-STEP DOWN: 63250
INV LISTENTIME: 0
OFDM ERRORS: 298793033
CCK ERRORS: 14665933
iw wlan0 station dump
Station xx:xx:xx:xx:xx:22 (on wlan0)
inactive time: 950 ms
rx bytes: 65533
rx packets: 517
tx bytes: 72318
tx packets: 226
tx retries: 332
tx failed: 9
signal: -41 [-48, -41, -51] dBm
signal avg: -35 [-42, -37, -46] dBm
tx bitrate: 130.0 MBit/s MCS 15
rx bitrate: 1.0 MBit/s
authorized: yes
authenticated: yes
preamble: short
WMM/WME: yes
MFP: no
TDLS peer: no
comment:99 Changed 4 years ago by anonymous
Don't know if it is related to this problem but i got an Kernel oops after a few days of uptime with r39155 on WZR-HP-G300NH.
[871033.260000] ------------[ cut here ]------------
[871033.260000] WARNING: at /store/buildbot/slave/ar71xx/build/build_dir/target-mips_34kc_uClibc-0.9.33.2/linux-ar71xx_generic/compat-wireless-2013-11-05/net/mac80211/rx.c:3365 mac80211_ieee80211_rx+0x134/0x800 [mac80211]()
[871033.280000] Rate marked as an HT rate but passed status->rate_idx is not an MCS index [0-76]: 79 (0x4f)
[871033.290000] Modules linked in: ath9k ath9k_common pppoe ppp_async iptable_nat ath9k_hw ath pppox ppp_generic nf_nat_ipv4 nf_conntrack_ipv4 mac80211 ipt_MASQUERADE cfg80211 xt_time xt_tcpudp xt_state xt_nat xt_multiport xt_mark xt_mac xt_limit xt_conntrack xt_comment xt_TCPMSS xt_REDIRECT xt_LOG xt_CT slhc nf_nat_irc nf_nat_ftp nf_nat nf_defrag_ipv4 nf_conntrack_irc nf_conntrack_ftp iptable_raw iptable_mangle iptable_filter ipt_REJECT ip_tables crc_ccitt compat ledtrig_usbdev ledtrig_netdev ip6t_REJECT ip6t_rt ip6t_hbh ip6t_mh ip6t_ipv6header ip6t_frag ip6t_eui64 ip6t_ah ip6table_raw ip6table_mangle ip6table_filter ip6_tables x_tables nf_conntrack_ipv6 nf_conntrack nf_defrag_ipv6 ipv6 arc4 crypto_blkcipher leds_gpio ohci_hcd ledtrig_timer ledtrig_default_on ehci_platform ehci_hcd gpio_button_hotplug usbcore nls_base usb_common
[871033.360000] CPU: 0 PID: 0 Comm: swapper Not tainted 3.10.24 #1
[871033.370000] Stack : 00000006 00000000 00000000 00000000 00000000 00000000 803a2ac6 00000032
[871033.370000] 803276b8 802d7664 80382a38 8032743b 00000000 00000400 00000010 00000000
[871033.370000] 83884010 800790b0 00000003 80076af0 00000000 00000000 802d8f2c 80321b9c
[871033.370000] 00321b9c 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[871033.370000] 00000000 00000000 00000000 00000000 00000000 00000000 00000000 80321b28
[871033.370000] ...
[871033.410000] Call Trace:
[871033.410000] [<8006e2f0>] show_stack+0x48/0x70
[871033.410000] [<80076bec>] warn_slowpath_common+0x78/0xa8
[871033.420000] [<80076ca4>] warn_slowpath_fmt+0x2c/0x38
[871033.420000] [<8329e814>] mac80211_ieee80211_rx+0x134/0x800 [mac80211]
[871033.430000] [<83166a88>] ath_rx_tasklet+0xd40/0xe34 [ath9k]
[871033.440000] [<83164a10>] ath9k_tasklet+0x100/0x180 [ath9k]
[871033.440000] [<8007e100>] tasklet_action+0x78/0xc8
[871033.450000] [<8007d92c>] do_softirq+0xc8/0x1b4
[871033.450000] [<8007dac8>] do_softirq+0x48/0x68
[871033.460000] [<8007dd04>] irq_exit+0x54/0x70
[871033.460000] [<8006082c>] ret_from_irq+0x0/0x4
[871033.460000] [<80060a60>] r4k_wait+0x20/0x40
[871033.470000] [<8009ecb4>] cpu_startup_entry+0xa0/0x108
[871033.470000] [<8033e908>] start_kernel+0x380/0x3a0
[871033.480000]
[871033.480000] ---[ end trace 7b6176610614fce4 ]---
comment:100 Changed 4 years ago by elvstone@…
I've been running a quite old revision (r36088) for a long time, and my WRT160NL has been quite annoying, with frequent disconnects so that I've had to restart the router. But now that my girlfriend has gotten a new laptop, the problems have gotten worse. The speed has been crawling (unusably slow) for several days.
So now I'm going to upgrade. Should I try Attitude Adjustment final or the latest trunk revision?
comment:101 Changed 4 years ago by nbd
please try r39688 or newer
comment:102 Changed 4 years ago by dap@…
r39688 seems like OK, but the counters in file /sys/kernel/debug/ieee80211/phy0/ath9k/ani is weird:
ANI: ENABLED
ANI RESET: 3
SPUR UP: 0
SPUR DOWN: 0
OFDM WS-DET ON: 0
OFDM WS-DET OFF: 0
MRC-CCK ON: 0
MRC-CCK OFF: 0
FIR-STEP UP: 0
FIR-STEP DOWN: 0
INV LISTENTIME: 0
OFDM ERRORS: 601939
CCK ERRORS: 23662
There's too much zeros after hours of uptime, I've never seen this in older releases if I remember correctly. I'm not sure if ANI is *really* working now.
comment:103 Changed 4 years ago by nbd
is it still this way with newer versions?
comment:104 Changed 4 years ago by fa11enangel
Device: TP-Link TL-WR1043ND v2.1
OpenWRT Version: r39535
After long time the error comes back:
[ 16.590000] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready [ 17.050000] br-lan: port 1(eth1) entered forwarding state [ 17.220000] IPv6: ADDRCONF(NETDEV_UP): wlan0: link is not ready [ 17.230000] device wlan0 entered promiscuous mode [ 17.390000] br-lan: port 2(wlan0) entered forwarding state [ 17.390000] br-lan: port 2(wlan0) entered forwarding state [ 17.400000] IPv6: ADDRCONF(NETDEV_CHANGE): wlan0: link becomes ready [ 19.390000] br-lan: port 2(wlan0) entered forwarding state [258894.260000] ath: phy0: Failed to stop TX DMA, queues=0x004! [346640.110000] ath: phy0: Failed to stop TX DMA, queues=0x005!
The wireless lan is working, but sometime users have complained about connection problems to some sites. After rebooting the device the wireless lan was working again.
comment:105 Changed 4 years ago by dap@…
Now it's sure: r39688 is also broken with enabled ANI, mainly fails at massive downloads.
Symptoms are variable: sometimes only the downloading client lost the connection, sometimes the whole AP disappears.
It's stable when ANI is disabled.
comment:106 Changed 4 years ago by nbd
I changed ANI to improve behavior on older chips in r39767 - please test.
comment:107 Changed 4 years ago by dap@…
Ok, I'm running r39788 now.
Counters of file /sys/kernel/debug/ieee80211/phy0/ath9k/ani seems the usual this time:
root@OpenWrt:~# uptime
13:37:43 up 7 min, load average: 0.00, 0.06, 0.04
root@OpenWrt:~# cat /sys/kernel/debug/ieee80211/phy0/ath9k/ani
ANI: ENABLED
ANI RESET: 4
SPUR UP: 95
SPUR DOWN: 95
OFDM WS-DET ON: 0
OFDM WS-DET OFF: 0
MRC-CCK ON: 0
MRC-CCK OFF: 0
FIR-STEP UP: 98
FIR-STEP DOWN: 93
INV LISTENTIME: 0
OFDM ERRORS: 117313
CCK ERRORS: 4094
I begin to stress test it.
comment:108 Changed 4 years ago by dap@…
Early report: the stability problem fixed by the patch, but introduced a new performance issue.
I was downloading when there was a short hiccup in the traffic flow. One client were disconnected (maybe the user reconnected due to traffic stalling) and my workstation rx/tx bitrate peak dropped to 117Mb/s from that point.
After about 10 minutes I typed 'wifi' command that immediately restored my workstation rx/tx bitrate to 270Mb/s, the download speed jumped up.
iw wlan0 station dump after the hiccup:
Station a0:f3:c1:f8:9b:e0 (on wlan0)
inactive time: 0 ms
rx bytes: 899045603
rx packets: 2658734
tx bytes: 1910238837
tx packets: 4335089
tx retries: 423763
tx failed: 115
signal: -42 [-52, -43] dBm
signal avg: -42 [-51, -43] dBm
tx bitrate: 117.0 MBit/s MCS 14
rx bitrate: 104.0 MBit/s MCS 13
authorized: yes
authenticated: yes
preamble: short
WMM/WME: yes
MFP: no
TDLS peer: no
iw wlan0 station after 'wifi' command issued:
Station a0:f3:c1:f8:9b:e0 (on wlan0)
inactive time: 0 ms
rx bytes: 8033073
rx packets: 32395
tx bytes: 80907015
tx packets: 56014
tx retries: 3566
tx failed: 0
signal: -47 [-56, -47] dBm
signal avg: -46 [-56, -47] dBm
tx bitrate: 270.0 MBit/s MCS 14 40MHz short GI
rx bitrate: 270.0 MBit/s MCS 14 40MHz short GI
authorized: yes
authenticated: yes
preamble: short
WMM/WME: yes
MFP: no
TDLS peer: no
There's nothing interesting in the log.
comment:109 Changed 4 years ago by dap@…
There is even stability issues, now the downloading client lost the connection. AP log message:
hostapd: wlan0: STA a0:f3:c1:f8:9b:e0 IEEE 802.11: disconnected due to excessive missing ACKs
Now I run this stress test without ANI.
comment:110 Changed 4 years ago by nbd
Please update to latest trunk with the change I just committed.
Afterwards, show me /sys/kernel/debug/ieee80211/phy0/ath9k/ani both in working state (after running a few minutes), and when the stability issue has appeared.
comment:111 Changed 4 years ago by dap@…
comment:112 Changed 4 years ago by dap@…
I did hit the first traffic stall after a few minutes, had to re-connect the client.
Right before:
ANI: ENABLED
ANI RESET: 4
OFDM LEVEL: 9
CCK LEVEL: 0
SPUR UP: 4
SPUR DOWN: 4
OFDM WS-DET ON: 0
OFDM WS-DET OFF: 0
MRC-CCK ON: 0
MRC-CCK OFF: 0
FIR-STEP UP: 77
FIR-STEP DOWN: 71
INV LISTENTIME: 0
OFDM ERRORS: 320963
CCK ERRORS: 20037
Right after:
ANI: ENABLED
ANI RESET: 4
OFDM LEVEL: 8
CCK LEVEL: 0
SPUR UP: 4
SPUR DOWN: 4
OFDM WS-DET ON: 0
OFDM WS-DET OFF: 0
MRC-CCK ON: 0
MRC-CCK OFF: 0
FIR-STEP UP: 78
FIR-STEP DOWN: 73
INV LISTENTIME: 0
OFDM ERRORS: 324167
CCK ERRORS: 20078
comment:113 Changed 4 years ago by dap@…
As I found, "traffic stalling" means AP becomes deaf, the client still sees the AP's traffic:
19:35:42.780542 IP 192.168.5.118 > 192.168.5.2: ICMP echo request, id 11482, seq 2999, length 64 19:35:42.901602 ARP, Request who-has 192.168.5.118 tell 192.168.5.2, length 28 19:35:42.901616 ARP, Reply 192.168.5.118 is-at a0:f3:c1:f8:9b:e0, length 28 19:35:43.720796 ARP, Request who-has 192.168.5.116 tell 192.168.5.2, length 28 19:35:43.780559 IP 192.168.5.118 > 192.168.5.2: ICMP echo request, id 11482, seq 3000, length 64 19:35:43.925598 ARP, Request who-has 192.168.5.118 tell 192.168.5.2, length 28 19:35:43.925613 ARP, Reply 192.168.5.118 is-at a0:f3:c1:f8:9b:e0, length 28 19:35:44.324578 IP 192.168.5.118.50297 > 217.20.130.72.openvpn: UDP, length 14 19:35:44.744820 ARP, Request who-has 192.168.5.116 tell 192.168.5.2, length 28
comment:114 Changed 4 years ago by nbd
Please try r39865, it should be better now. It was missing one more change to properly toggle weak signal detection.
comment:115 Changed 4 years ago by dap@…
Still not fixed in r39865, but the behavior changed: after a few minutes every clients' connection quality dropped, some clients tried to reconnect repeatedly. High ping times, very slow networking, hard to re-connect. Remembered me to my original report.
The ani file shows weird counter values this time:
Before:
ANI: ENABLED
ANI RESET: 9
OFDM LEVEL: 8
CCK LEVEL: 0
SPUR UP: 29
SPUR DOWN: 29
OFDM WS-DET ON: 295
OFDM WS-DET OFF: 296
MRC-CCK ON: 0
MRC-CCK OFF: 0
FIR-STEP UP: 326
FIR-STEP DOWN: 305
INV LISTENTIME: 0
OFDM ERRORS: 234493
CCK ERRORS: 21398
Meanwhile:
ANI: ENABLED
ANI RESET: 9
OFDM LEVEL: 0
CCK LEVEL: 0
SPUR UP: 29
SPUR DOWN: 29
OFDM WS-DET ON: 358
OFDM WS-DET OFF: 358
MRC-CCK ON: 0
MRC-CCK OFF: 0
FIR-STEP UP: 389
FIR-STEP DOWN: 375
INV LISTENTIME: 0
OFDM ERRORS: 283564
CCK ERRORS: 27361
After 'wifi' command issued, connections restored:
ANI: ENABLED
ANI RESET: 14
OFDM LEVEL: 5
CCK LEVEL: 0
SPUR UP: 33
SPUR DOWN: 33
OFDM WS-DET ON: 358
OFDM WS-DET OFF: 358
MRC-CCK ON: 0
MRC-CCK OFF: 0
FIR-STEP UP: 396
FIR-STEP DOWN: 380
INV LISTENTIME: 0
OFDM ERRORS: 290320
CCK ERRORS: 30242
comment:116 Changed 4 years ago by nbd
when the issue occurs again, please measure how much the "OFDM ERRORS" counter increases during a time of 10 seconds or so (need average errors per second), and of the OFDM LEVEL is at 0 again.
comment:117 Changed 4 years ago by dap@…
I have identified multiple kind of ANI related problems. There may be multiple bugs or may be just different results of a bug. Let me explain them before I gave the numbers.
The "OFDM LEVEL zero" issue, symptom: very bad connection quality on all clients.
The "AP disaster" issue, symptom: all clients are disconnected and impossible to connect to the AP (the SSID is there).
The "downloader stalling" issue, symptom: I have to reconnect. Other clients may not be affected.
First I did hit the "downloader stalling" issue:
ANIR OFDM CCKL SPUP SPDW OWD1 OWD0 MRC1 MRC0 FIRU FIRD INVL OERR CERR 22:03:58 0 6 0 2 2 0 0 0 0 2 2 0 1787 354 22:04:08 0 6 0 1 1 0 0 0 0 1 1 0 1591 394 22:04:18 0 5 0 1 1 0 0 0 0 1 2 0 1767 438 22:04:28 0 6 0 1 1 0 0 0 0 1 0 0 1779 450 22:04:38 0 6 0 1 1 0 0 0 0 1 1 0 1852 352 22:04:48 0 6 0 2 2 0 0 0 0 2 2 0 1776 390 22:04:58 0 7 0 3 3 0 0 0 0 3 2 0 1938 422 22:05:08 0 5 0 1 1 0 0 0 0 1 3 0 1482 305 22:05:18 0 6 0 2 2 0 0 0 0 2 1 0 1858 326 22:05:28 0 6 0 2 2 0 0 0 0 2 2 0 1880 298 22:05:39 0 6 0 2 2 0 0 0 0 2 2 0 2036 248 22:05:49 0 6 0 1 1 0 0 0 0 1 1 0 1828 258 22:05:59 0 4 0 1 1 0 0 0 0 1 3 0 2098 332 22:06:09 0 5 0 3 3 0 0 0 0 3 2 0 2230 277 22:06:19 0 5 0 1 1 0 0 0 0 1 1 0 1755 236 22:06:29 0 3 0 2 2 0 0 0 0 2 4 0 2461 172 22:06:39 0 5 0 3 3 0 0 0 0 3 1 0 4238 1 22:06:49 0 5 0 2 2 0 0 0 0 2 2 0 3150 0 22:06:59 0 4 0 2 2 0 0 0 0 2 3 0 3080 0 22:07:09 0 4 0 1 1 0 0 0 0 1 1 0 2924 0 22:07:19 0 4 0 0 0 0 0 0 0 0 0 0 3358 0 22:07:29 0 4 0 1 1 0 0 0 0 1 1 0 3302 0 22:07:39 0 4 0 2 2 0 0 0 0 2 2 0 3139 1 22:07:49 0 3 0 0 0 0 0 0 0 0 1 0 3364 0 22:07:59 0 4 0 3 3 0 0 0 0 3 2 0 3419 0 22:08:09 0 4 0 2 2 0 0 0 0 2 2 0 3138 0 22:08:19 0 4 0 2 2 0 0 0 0 2 2 0 3179 1 22:08:29 0 4 0 1 1 0 0 0 0 1 1 0 3208 1 22:08:39 1 6 2 3 3 0 0 0 0 3 0 0 3938 52 ANIR: ANI RESET OFDM: OFDM LEVEL CCKL: CCK LEVEL SPUP: SPUR UP SPDW: SPUR DOWN OWD1: OFDM WS-DET ON OWD0: OFDM WS-DET OFF MRC1: MRC-CCK ON MRC0: MRC-CCK OFF FIRU: FIR-STEP UP FIRD: FIR-STEP DOWN INVL: INV LISTENTIME OERR: OFDM ERRORS CERR: CCK ERRORS Mon Mar 10 22:06:25 2014 daemon.info hostapd: wlan0: STA a0:f3:c1:f8:9b:e0 IEEE 802.11: disconnected due to excessive missing ACKs Mon Mar 10 22:06:55 2014 daemon.info hostapd: wlan0: STA a0:f3:c1:f8:9b:e0 IEEE 802.11: deauthenticated due to inactivity (timer DEAUTH/REMOVE)
Second I did hit the "AP disaster" issue:
ANIR OFDM CCKL SPUP SPDW OWD1 OWD0 MRC1 MRC0 FIRU FIRD INVL OERR CERR 22:39:51 0 4 0 2 2 0 0 0 0 2 2 0 1992 103 22:40:01 0 3 0 3 3 0 0 0 0 3 4 0 2332 94 22:40:11 1 5 2 5 5 0 0 0 0 5 2 0 2762 162 22:40:21 0 7 1 3 3 0 0 0 0 3 1 0 1799 127 22:40:32 0 4 0 1 1 0 0 0 0 1 4 0 1692 143 22:40:42 0 6 0 3 3 0 0 0 0 3 1 0 2219 244 22:40:52 0 5 0 1 1 0 0 0 0 1 2 0 1655 124 22:41:02 0 7 0 2 2 0 0 0 0 2 0 0 1990 182 22:41:12 0 7 0 1 1 0 0 0 0 1 1 0 1578 119 22:41:22 0 4 0 1 1 0 0 0 0 1 4 0 1585 107 22:41:32 0 4 0 1 1 0 0 0 0 1 1 0 3165 0 22:41:42 0 4 0 1 1 0 0 0 0 1 1 0 3149 1 22:41:52 0 4 0 2 2 0 0 0 0 2 2 0 3396 0 22:42:02 0 4 0 2 2 0 0 0 0 2 2 0 3369 1 22:42:12 0 5 0 4 4 0 0 0 0 4 3 0 3107 0 22:42:22 0 4 0 3 3 0 0 0 0 3 4 0 3335 0 22:42:32 0 3 0 2 2 0 0 0 0 2 3 0 3225 0 22:42:42 0 3 0 2 2 0 0 0 0 2 2 0 3508 1 22:42:52 0 4 0 3 3 0 0 0 0 3 2 0 3573 0 22:43:02 0 4 0 1 1 0 0 0 0 1 1 0 3036 0 Mon Mar 10 22:40:07 2014 kern.err kernel: [ 2701.660000] ath: phy0: Failed to stop TX DMA, queues=0x004! Mon Mar 10 22:40:07 2014 kern.err kernel: [ 2701.680000] ath: phy0: DMA failed to stop in 10 ms AR_CR=0x00000024 AR_DIAG_SW=0x42000020 DMADBG_7=0x00028cc1 Mon Mar 10 22:40:07 2014 kern.err kernel: [ 2701.690000] ath: phy0: Could not stop RX, we could be confusing the DMA engine when we start RX up Mon Mar 10 22:41:22 2014 daemon.info hostapd: wlan0: STA a0:f3:c1:f8:9b:e0 IEEE 802.11: disconnected due to excessive missing ACKs Mon Mar 10 22:41:52 2014 daemon.info hostapd: wlan0: STA a0:f3:c1:f8:9b:e0 IEEE 802.11: deauthenticated due to inactivity (timer DEAUTH/REMOVE)
The counters and the messages are similar in both cases, but the user experence very different..
I'm hunting for the "OFDM LEVEL zero" issue now.
comment:118 Changed 4 years ago by dap@…
Here is the "OFDM LEVEL zero" issue too. After 3 minutes the AP restored the service without manual intervention.
ANIR OFDM CCKL SPUP SPDW OWD1 OWD0 MRC1 MRC0 FIRU FIRD INVL OERR CERR 00:35:50 0 5 0 1 1 0 0 0 0 1 2 0 1660 140 00:36:00 0 5 0 2 2 0 0 0 0 2 2 0 1683 155 00:36:11 0 5 0 2 2 0 0 0 0 2 2 0 1745 161 00:36:21 0 5 0 1 1 0 0 0 0 1 1 0 1821 171 00:36:31 0 5 0 1 1 0 0 0 0 1 1 0 1814 165 00:36:41 0 6 0 1 1 0 0 0 0 1 0 0 1786 138 00:36:51 0 6 0 2 2 0 0 0 0 2 2 0 1836 228 00:37:01 0 6 0 1 1 0 0 0 0 1 1 0 1687 138 00:37:11 0 5 0 0 0 0 0 0 0 0 1 0 1842 166 00:37:21 0 6 0 3 3 0 0 0 0 3 2 0 1930 211 00:37:31 0 6 0 0 0 0 0 0 0 0 0 0 1523 119 00:37:41 0 6 0 2 2 0 0 0 0 2 2 0 1611 124 00:37:51 0 6 0 2 2 1 1 0 0 3 3 0 1593 112 00:38:01 0 5 0 1 1 0 0 0 0 1 2 0 1674 171 00:38:11 0 6 0 3 3 0 0 0 0 3 2 0 2004 192 00:38:21 0 6 0 1 1 0 0 0 0 1 1 0 1951 128 00:38:31 0 6 0 2 2 0 0 0 0 2 2 0 1829 117 00:38:41 0 7 0 2 2 0 0 0 0 2 1 0 2125 146 00:38:51 0 5 0 0 0 0 0 0 0 0 2 0 1963 144 00:39:01 0 6 0 2 2 0 0 0 0 2 1 0 1851 135 00:39:11 0 6 0 1 1 0 0 0 0 1 1 0 1609 150 00:39:21 0 5 0 1 1 0 0 0 0 1 2 0 1793 139 00:39:32 0 5 0 2 2 0 0 0 0 2 2 0 1938 199 00:39:42 0 5 0 1 1 0 0 0 0 1 1 0 2092 184 00:39:52 0 6 0 1 1 0 0 0 0 1 0 0 1949 233 00:40:02 0 6 0 2 2 0 0 0 0 2 2 0 1807 156 00:40:12 0 6 0 1 1 0 0 0 0 1 1 0 1437 135 00:40:22 0 5 0 1 1 0 0 0 0 1 2 0 1814 128 00:40:32 0 5 0 1 1 0 0 0 0 1 1 0 1899 134 00:40:42 0 5 0 0 0 0 0 0 0 0 0 0 2047 216 00:40:52 0 5 0 2 2 0 0 0 0 2 2 0 1860 147 00:41:02 0 6 0 2 2 0 0 0 0 2 1 0 1853 195 00:41:12 0 5 0 1 1 0 0 0 0 1 2 0 1636 206 00:41:22 0 5 0 1 1 0 0 0 0 1 1 0 1906 209 00:41:32 0 6 0 2 2 0 0 0 0 2 1 0 1832 144 00:41:42 0 5 0 1 1 0 0 0 0 1 2 0 1597 193 00:41:52 0 6 0 1 1 0 0 0 0 1 0 0 1762 127 00:42:02 0 6 0 2 2 0 0 0 0 2 2 0 1914 156 00:42:12 0 6 0 1 1 0 0 0 0 1 1 0 1687 139 00:42:22 0 7 0 3 3 0 0 0 0 3 2 0 1937 160 00:42:32 0 7 0 1 1 1 1 0 0 2 2 0 1561 116 00:42:42 0 5 0 1 1 0 0 0 0 1 3 0 1602 128 00:42:52 0 0 0 1 1 0 0 0 0 1 5 0 766 52 00:43:02 0 0 0 0 0 0 0 0 0 0 0 0 0 0 00:43:12 0 0 0 0 0 0 0 0 0 0 0 0 0 0 00:43:22 0 0 0 0 0 0 0 0 0 0 0 0 0 0 00:43:32 0 0 0 0 0 0 0 0 0 0 0 0 0 0 00:43:42 0 0 0 0 0 0 0 0 0 0 0 0 0 0 00:43:53 0 0 0 0 0 0 0 0 0 0 0 0 0 0 [..only zeros..] 00:46:07 0 0 0 0 0 0 0 0 0 0 0 0 0 0 00:46:17 1 6 2 3 3 0 0 0 0 3 0 0 2876 12 [some clients are reconnected at this point, only light traffic, the download tcp stream still stalling] 00:46:27 0 7 1 2 2 0 0 0 0 2 1 0 2719 9 00:46:37 0 6 0 1 1 0 0 0 0 1 2 0 2920 10 00:46:47 0 4 0 0 0 0 0 0 0 0 2 0 2624 16 00:46:57 0 4 0 1 1 0 0 0 0 1 1 0 3190 28 00:47:07 0 4 0 0 0 0 0 0 0 0 0 0 2880 17 00:47:17 0 4 0 0 0 0 0 0 0 0 0 0 2979 26 00:47:27 0 4 0 3 3 0 0 0 0 3 3 0 3235 20 00:47:37 0 4 0 1 1 0 0 0 0 1 1 0 3115 22 00:47:47 0 6 0 2 2 0 0 0 0 2 0 0 3731 24 00:47:57 0 5 0 1 1 0 0 0 0 1 2 0 3014 16 Tue Mar 11 00:42:46 2014 daemon.info hostapd: wlan0: STA a0:f3:c1:f8:9b:e0 IEEE 802.11: disconnected due to excessive missing ACKs Tue Mar 11 00:43:16 2014 daemon.info hostapd: wlan0: STA a0:f3:c1:f8:9b:e0 IEEE 802.11: deauthenticated due to inactivity (timer DEAUTH/REMOVE)
Now I doubt if there is such situation when some clients are not affected. Maybe the ANI reset fooled me, restored the service meanwhile I walked to the another client to check the network.. So I think the "downloader stalling" issue is same as the "AP disaster", the only difference is the intervention of the ANI reset.
Anyway, this is definitely different.
comment:119 Changed 4 years ago by nbd
Got another one for you: http://nbd.name/950-ath9k_ani_test.patch
The stats are very useful, please also include them for the next round.
Thanks for testing!
comment:120 Changed 4 years ago by dap@…
r39865 patched, the first issue is here.
The download began stalling at 14:17:05. I was able to reconnect without ANI reset this time and did not get "disconnected due to excessive missing ACKs" message. This is not typical.
ANIR OFDM CCKL SPUP SPDW OWD1 OWD0 MRC1 MRC0 FIRU FIRD INVL OERR CERR 14:07:50 0 6 0 1 1 0 0 0 0 1 0 0 2146 110 14:08:00 0 5 0 1 1 0 0 0 0 1 2 0 2017 92 14:08:10 0 6 0 3 3 0 0 0 0 3 2 0 2200 75 14:08:20 0 5 0 1 1 0 0 0 0 1 2 0 1860 93 14:08:30 0 5 0 1 1 0 0 0 0 1 1 0 2103 95 14:08:40 0 6 0 1 1 0 0 0 0 1 0 0 2342 80 14:08:50 0 4 0 0 0 0 0 0 0 0 2 0 1909 76 14:09:00 0 5 0 3 3 0 0 0 0 3 2 0 2263 81 14:09:10 0 5 0 0 0 0 0 0 0 0 0 0 1999 83 14:09:20 0 5 0 1 1 0 0 0 0 1 1 0 2032 86 14:09:30 0 5 0 1 1 0 0 0 0 1 1 0 1813 94 14:09:40 0 5 0 0 0 0 0 0 0 0 0 0 1612 91 14:09:50 0 5 0 1 1 0 0 0 0 1 1 0 1706 100 14:10:00 0 6 0 4 4 0 0 0 0 4 3 0 2113 71 14:10:10 0 5 0 2 2 0 0 0 0 2 3 0 2060 99 14:10:21 0 6 0 1 1 0 0 0 0 1 0 0 2143 74 14:10:31 0 5 0 2 2 0 0 0 0 2 3 0 1809 72 14:10:41 0 5 0 2 2 0 0 0 0 2 2 0 2330 69 14:10:51 0 5 0 1 1 0 0 0 0 1 1 0 1834 73 14:11:01 0 5 0 2 2 0 0 0 0 2 2 0 2041 67 14:11:11 0 5 0 0 0 0 0 0 0 0 0 0 2113 62 14:11:21 0 6 0 3 3 0 0 0 0 3 2 0 2311 85 14:11:31 0 6 0 2 2 1 1 0 0 3 3 0 2003 77 14:11:41 0 7 0 3 3 0 0 0 0 3 2 0 2539 95 14:11:51 0 5 0 2 2 0 0 0 0 2 4 0 2289 95 14:12:01 0 7 0 3 3 0 0 0 0 3 1 0 2932 90 14:12:11 0 6 0 4 4 0 0 0 0 4 5 0 2177 73 14:12:21 0 6 0 3 3 0 0 0 0 3 3 0 1999 78 14:12:31 0 6 0 2 2 0 0 0 0 2 2 0 1800 71 14:12:41 0 6 0 2 2 0 0 0 0 2 2 0 2114 82 14:12:51 0 4 0 2 2 0 0 0 0 2 4 0 1924 95 14:13:01 0 5 0 2 2 0 0 0 0 2 1 0 1984 84 14:13:11 0 5 0 0 0 0 0 0 0 0 0 0 1814 110 14:13:21 0 5 0 2 2 0 0 0 0 2 2 0 1887 103 14:13:31 0 6 0 4 4 0 0 0 0 4 3 0 2049 79 14:13:41 0 6 0 1 1 0 0 0 0 1 1 0 1760 77 14:13:51 0 5 0 3 3 0 0 0 0 3 4 0 2237 84 14:14:02 0 5 0 2 2 0 0 0 0 2 2 0 1756 70 14:14:12 0 4 0 2 2 0 0 0 0 2 3 0 2183 99 14:14:22 0 5 0 2 2 0 0 0 0 2 1 0 2172 75 14:14:32 0 6 0 3 3 0 0 0 0 3 2 0 2099 80 14:14:42 0 5 0 0 0 0 0 0 0 0 1 0 1882 77 14:14:52 0 6 0 2 2 0 0 0 0 2 1 0 2365 69 14:15:02 0 6 0 3 3 0 0 0 0 3 3 0 1890 55 14:15:12 0 5 0 1 1 0 0 0 0 1 2 0 1905 79 14:15:22 0 5 0 1 1 0 0 0 0 1 1 0 1647 81 14:15:32 0 5 0 1 1 0 0 0 0 1 1 0 2247 77 14:15:42 0 5 0 3 3 0 0 0 0 3 3 0 1955 67 14:15:52 0 5 0 3 3 0 0 0 0 3 3 0 2220 78 14:16:02 0 6 0 3 3 0 0 0 0 3 2 0 2192 65 14:16:12 0 6 0 3 3 0 0 0 0 3 3 0 2244 68 14:16:22 0 5 0 1 1 0 0 0 0 1 2 0 2063 79 14:16:32 0 5 0 1 1 0 0 0 0 1 1 0 2144 85 14:16:42 0 5 0 2 2 0 0 0 0 2 2 0 2168 78 14:16:52 0 5 0 2 2 0 0 0 0 2 2 0 1910 83 14:17:02 0 5 0 2 2 0 0 0 0 2 2 0 2439 51 14:17:12 0 5 0 1 1 0 0 0 0 1 1 0 3052 3 14:17:22 0 4 0 0 0 0 0 0 0 0 1 0 3083 10 14:17:32 0 5 0 1 1 0 0 0 0 1 0 0 3033 8 14:17:42 0 5 0 1 1 0 0 0 0 1 1 0 3152 3 14:17:52 0 5 0 0 0 0 0 0 0 0 0 0 3001 8 14:18:02 0 5 0 2 2 0 0 0 0 2 2 0 3347 5 14:18:12 0 5 0 1 1 0 0 0 0 1 1 0 3115 14 14:18:22 0 5 0 1 1 0 0 0 0 1 1 0 3208 6 14:18:32 0 5 0 0 0 0 0 0 0 0 0 0 2954 14 14:18:42 0 5 0 2 2 0 0 0 0 2 2 0 3527 11
And there's a kernel message:
Tue Mar 11 14:08:59 2014 kern.warn kernel: [55094.520000] ------------[ cut here ]------------ Tue Mar 11 14:08:59 2014 kern.warn kernel: [55094.530000] WARNING: at /usr/src/openwrt/trunk/build_dir/target-mips_34kc_uClibc-0.9.33.2/linux-ar71xx_generic/compat-wireless-2014-01-23.1/net/mac80211/rx.c:3397 mac80211_ieee80211_rx+0x13c/0x818 [mac80211]() Tue Mar 11 14:08:59 2014 kern.warn kernel: [55094.550000] Rate marked as an HT rate but passed status->rate_idx is not an MCS index [0-76]: 101 (0x65) Tue Mar 11 14:08:59 2014 kern.warn kernel: [55094.560000] Modules linked in: ath9k ath9k_common ath9k_hw ath pppoe ppp_async iptable_nat pppox ppp_generic nf_nat_ipv4 nf_conntrack_netlink nf_conntrack_ipv4 mac80211 ipt_MASQUERADE cfg80211 xt_time xt_tcpudp xt_state xt_nat xt_multiport xt_mark xt_mac xt_limit xt_conntrack xt_comment xt_TCPMSS xt_REDIRECT xt_LOG xt_CT slhc nfnetlink nf_nat_irc nf_nat_ftp nf_nat nf_defrag_ipv4 nf_conntrack_irc nf_conntrack_ftp iptable_raw iptable_mangle iptable_filter ipt_REJTue Mar 11 14:08:59 2014 kern.warn kernel: [55094.640000] CPU: 0 PID: 3 Comm: ksoftirqd/0 Tainted: G W 3.10.32 #1 Tue Mar 11 14:08:59 2014 kern.warn kernel: [55094.650000] Stack : 00000000 00000000 00000000 00000000 80372e7a 00000041 81828a08 80dea010 Tue Mar 11 14:08:59 2014 kern.warn kernel: [55094.650000] 802d2600 803213db 00000003 80372628 81828a08 80dea010 80f05d74 00000014 Tue Mar 11 14:08:59 2014 kern.warn kernel: [55094.650000] 00000018 80078fb0 00000003 800769c0 80cc9688 80dea010 802d3ec0 8183bc64 Tue Mar 11 14:08:59 2014 kern.warn kernel: [55094.650000] 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 Tue Mar 11 14:08:59 2014 kern.warn kernel: [55094.650000] 00000000 00000000 00000000 00000000 00000000 00000000 00000000 8183bbf0 Tue Mar 11 14:08:59 2014 kern.warn kernel: [55094.650000] ... Tue Mar 11 14:08:59 2014 kern.warn kernel: [55094.690000] Call Trace: Tue Mar 11 14:08:59 2014 kern.warn kernel: [55094.690000] [<8006e278>] show_stack+0x48/0x70 Tue Mar 11 14:08:59 2014 kern.warn kernel: [55094.690000] [<80076b30>] warn_slowpath_common+0x78/0xa8 Tue Mar 11 14:08:59 2014 kern.warn kernel: [55094.700000] [<80076b8c>] warn_slowpath_fmt+0x2c/0x38 Tue Mar 11 14:08:59 2014 kern.warn kernel: [55094.700000] [<80c9f388>] mac80211_ieee80211_rx+0x13c/0x818 [mac80211] Tue Mar 11 14:08:59 2014 kern.warn kernel: [55094.710000] [<80e46c74>] ath_rx_tasklet+0xcc0/0xda8 [ath9k] Tue Mar 11 14:08:59 2014 kern.warn kernel: [55094.720000] [<80e44214>] ath9k_tasklet+0x1ac/0x230 [ath9k] Tue Mar 11 14:08:59 2014 kern.warn kernel: [55094.720000] [<8007e0ac>] tasklet_action+0x84/0xcc Tue Mar 11 14:08:59 2014 kern.warn kernel: [55094.730000] [<8007d8ac>] __do_softirq+0xd0/0x1b8 Tue Mar 11 14:08:59 2014 kern.warn kernel: [55094.730000] [<8007d9c0>] run_ksoftirqd+0x2c/0x58 Tue Mar 11 14:08:59 2014 kern.warn kernel: [55094.740000] [<80099d40>] smpboot_thread_fn+0x134/0x164 Tue Mar 11 14:08:59 2014 kern.warn kernel: [55094.740000] [<80092e9c>] kthread+0xb0/0xb8 Tue Mar 11 14:08:59 2014 kern.warn kernel: [55094.740000] [<80060878>] ret_from_kernel_thread+0x14/0x1c Tue Mar 11 14:08:59 2014 kern.warn kernel: [55094.750000] Tue Mar 11 14:08:59 2014 kern.warn kernel: [55094.750000] ---[ end trace 8c56e57a4320c6d8 ]---
comment:121 Changed 4 years ago by nbd
The fact that it doesn't get stuck anymore sounds like progress to me.
Here's another patch: http://nbd.name/951-rifs_test.patch (keep the last one in your tree)
Increasing the PHY search delay should hopefully reduce the number of errors as well.
comment:122 Changed 4 years ago by dap@…
Yes, it's definitely progress! Before the patch I was able to reproduce an issue in a few minutes, but now I have had only one problem in hours of testing (and possible this isn't ANI related, I did not test this version without ANI as long as now I do with ANI).
I'm applying the new patch and restart testing.
comment:123 Changed 4 years ago by dap@…
ANI counters with 951-rifs_test.patch:
ANIR OFDM CCKL SPUP SPDW OWD1 OWD0 MRC1 MRC0 FIRU FIRD INVL OERR CERR 16:15:01 0 5 0 1 1 0 0 0 0 1 1 0 2620 123 16:15:11 0 4 0 1 1 0 0 0 0 1 2 0 1405 209 16:15:21 0 5 0 2 2 0 0 0 0 2 1 0 2119 259 16:15:31 1 4 2 3 3 0 0 0 0 3 6 0 2089 119 16:15:41 0 3 0 1 1 0 0 0 0 0 1 0 1562 128 16:15:51 0 3 0 1 1 0 0 0 0 0 0 0 1763 99 16:16:01 0 3 0 0 0 0 0 0 0 0 0 0 2080 127 16:16:11 0 2 0 0 0 0 0 0 0 0 0 0 1799 109 16:16:21 0 1 0 2 2 0 0 0 0 0 1 0 2115 126 16:16:32 0 2 0 2 2 0 0 0 0 1 0 0 2316 98 16:16:42 0 5 0 5 5 0 0 0 0 4 2 0 2727 278 16:16:52 0 6 0 1 1 0 0 0 0 1 0 0 2026 230 16:17:02 0 4 0 1 1 0 0 0 0 1 3 0 1692 236 16:17:12 0 6 0 3 3 0 0 0 0 3 1 0 2044 185 16:17:22 0 5 0 1 1 0 0 0 0 1 2 0 1688 133 16:17:32 0 5 0 1 1 0 0 0 0 1 1 0 1347 189 16:17:42 0 5 0 1 1 0 0 0 0 1 1 0 2187 178
I can't see big difference, the performance is good with both versions.
comment:124 Changed 4 years ago by dap@…
Aie, "OFDM LEVEL zero" is back;
ANIR OFDM CCKL SPUP SPDW OWD1 OWD0 MRC1 MRC0 FIRU FIRD INVL OERR CERR 16:38:34 0 5 0 3 3 0 0 0 0 3 3 0 1591 104 16:38:44 0 5 0 0 0 0 0 0 0 0 0 0 1804 135 16:38:54 0 6 0 2 2 0 0 0 0 2 1 0 1808 112 16:39:04 0 5 0 1 1 0 0 0 0 1 2 0 1765 100 16:39:14 0 5 0 2 2 0 0 0 0 2 2 0 1710 126 16:39:24 0 5 0 1 1 0 0 0 0 1 1 0 1708 105 16:39:34 0 4 0 0 0 0 0 0 0 0 1 0 1847 102 16:39:44 0 5 0 2 2 0 0 0 0 2 1 0 1742 113 16:39:54 0 5 0 2 2 0 0 0 0 2 2 0 1750 87 16:40:04 0 5 0 2 2 0 0 0 0 2 2 0 1762 112 16:40:15 0 6 0 3 3 0 0 0 0 3 2 0 1743 122 16:40:25 0 5 0 1 1 0 0 0 0 1 2 0 1711 147 16:40:35 0 6 0 2 2 0 0 0 0 2 1 0 1810 105 16:40:45 0 5 0 2 2 0 0 0 0 2 3 0 1618 126 16:40:55 0 5 0 3 3 0 0 0 0 3 3 0 1727 89 16:41:05 0 5 0 0 0 0 0 0 0 0 0 0 1748 102 16:41:15 0 5 0 0 0 0 0 0 0 0 0 0 1864 103 16:41:25 0 5 0 0 0 0 0 0 0 0 0 0 1851 100 16:41:35 0 5 0 1 1 0 0 0 0 1 1 0 1671 89 16:41:45 0 4 0 1 1 0 0 0 0 1 2 0 1688 130 16:41:55 0 5 0 3 3 0 0 0 0 3 2 0 1836 144 16:42:05 0 5 0 1 1 0 0 0 0 1 1 0 1567 118 16:42:15 0 4 0 2 2 0 0 0 0 2 3 0 1737 123 16:42:25 0 5 0 3 3 0 0 0 0 3 2 0 1915 154 16:42:35 0 5 0 2 2 0 0 0 0 2 2 0 1872 124 16:42:45 0 5 0 2 2 0 0 0 0 2 2 0 1652 196 16:42:55 0 5 0 1 1 0 0 0 0 1 1 0 1619 155 16:43:05 0 5 0 1 1 0 0 0 0 1 1 0 1660 103 16:43:15 0 5 0 1 1 0 0 0 0 1 1 0 1479 115 16:43:25 0 4 0 0 0 0 0 0 0 0 1 0 1526 135 16:43:36 0 5 0 3 3 0 0 0 0 3 2 0 1738 137 16:43:46 0 5 0 2 2 0 0 0 0 2 2 0 1533 112 16:43:56 0 4 0 3 3 0 0 0 0 3 4 0 1888 142 16:44:06 0 4 0 4 4 0 0 0 0 4 4 0 1719 123 16:44:16 0 4 0 4 4 0 0 0 0 4 4 0 1917 153 16:44:26 0 4 0 3 3 0 0 0 0 3 3 0 1805 143 16:44:36 0 4 0 3 3 0 0 0 0 3 3 0 2229 126 16:44:46 0 4 0 4 4 0 0 0 0 4 4 0 1676 182 16:44:56 0 5 0 5 5 0 0 0 0 5 4 0 2128 137 16:45:06 0 4 0 3 3 0 0 0 0 3 4 0 1974 150 16:45:16 0 5 0 4 4 0 0 0 0 4 3 0 2007 149 16:45:26 0 0 0 0 0 0 0 0 0 0 4 0 1068 103 16:45:36 0 0 0 0 0 0 0 0 0 0 0 0 1030 110 16:45:46 0 0 0 0 0 0 0 0 0 0 0 0 863 104 16:45:56 0 0 0 0 0 0 0 0 0 0 0 0 918 133 16:46:06 0 0 0 0 0 0 0 0 0 0 0 0 956 92 16:46:16 0 0 0 0 0 0 0 0 0 0 0 0 1009 109 16:46:26 0 0 0 0 0 0 0 0 0 0 0 0 1282 114 16:46:36 0 0 0 0 0 0 0 0 0 0 0 0 1382 90 16:46:46 0 0 0 0 0 0 0 0 0 0 0 0 1240 101 16:46:56 0 0 0 0 0 0 0 0 0 0 0 0 1333 105 16:47:06 0 0 0 0 0 0 0 0 0 0 0 0 1224 78 16:47:17 0 0 0 0 0 0 0 0 0 0 0 0 1297 118 16:47:27 0 0 0 0 0 0 0 0 0 0 0 0 1358 98 16:47:37 0 0 0 0 0 0 0 0 0 0 0 0 1340 83 16:47:47 0 0 0 0 0 0 0 0 0 0 0 0 1287 103 16:47:57 0 0 0 0 0 0 0 0 0 0 0 0 1293 117 16:48:07 0 0 0 0 0 0 0 0 0 0 0 0 1331 108 16:48:17 0 0 0 0 0 0 0 0 0 0 0 0 1128 108 16:48:27 0 0 0 0 0 0 0 0 0 0 0 0 1295 93 16:48:37 0 1 0 1 1 0 0 0 0 1 0 0 1462 69 16:48:47 0 0 0 0 0 0 0 0 0 0 1 0 1118 81 16:48:57 0 0 0 0 0 0 0 0 0 0 0 0 1235 91 16:49:07 0 0 0 0 0 0 0 0 0 0 0 0 1308 77 16:49:17 0 0 0 0 0 0 0 0 0 0 0 0 1345 84 16:49:27 0 0 0 0 0 0 0 0 0 0 0 0 1432 100 16:49:37 0 0 0 0 0 0 0 0 0 0 0 0 1022 64 16:49:47 0 0 0 0 0 0 0 0 0 0 0 0 0 0 16:49:57 0 0 0 0 0 0 0 0 0 0 0 0 0 0 16:50:07 0 0 0 0 0 0 0 0 0 0 0 0 0 0 16:50:17 0 0 0 0 0 0 0 0 0 0 0 0 0 0 16:50:27 0 0 0 0 0 0 0 0 0 0 0 0 0 0 16:50:37 0 0 0 0 0 0 0 0 0 0 0 0 0 0 16:50:47 0 0 0 0 0 0 0 0 0 0 0 0 0 0 16:50:57 0 0 0 0 0 0 0 0 0 0 0 0 0 0 16:51:07 0 0 0 0 0 0 0 0 0 0 0 0 0 0 16:51:17 0 0 0 0 0 0 0 0 0 0 0 0 0 0 16:51:27 0 0 0 0 0 0 0 0 0 0 0 0 0 0 16:51:37 0 0 0 0 0 0 0 0 0 0 0 0 0 0 16:51:47 0 0 0 0 0 0 0 0 0 0 0 0 0 0 16:51:57 0 0 0 0 0 0 0 0 0 0 0 0 0 0 16:52:07 0 0 0 0 0 0 0 0 0 0 0 0 0 0 16:52:17 0 0 0 0 0 0 0 0 0 0 0 0 0 0 16:52:27 0 0 0 0 0 0 0 0 0 0 0 0 0 0 16:52:37 0 0 0 0 0 0 0 0 0 0 0 0 0 0 16:52:47 0 0 0 0 0 0 0 0 0 0 0 0 0 0 16:52:57 0 0 0 0 0 0 0 0 0 0 0 0 0 0 16:53:08 0 0 0 0 0 0 0 0 0 0 0 0 0 0 16:53:18 0 0 0 0 0 0 0 0 0 0 0 0 0 0 16:53:28 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Tue Mar 11 16:49:36 2014 daemon.info hostapd: wlan2: STA a0:f3:c1:f8:9b:e0 IEEE 802.11: disconnected due to excessive missing ACKs Tue Mar 11 16:49:53 2014 daemon.info hostapd: wlan2: STA 7c:d1:c3:6d:16:e6 IEEE 802.11: disconnected due to excessive missing ACKs Tue Mar 11 16:50:06 2014 daemon.info hostapd: wlan2: STA a0:f3:c1:f8:9b:e0 IEEE 802.11: deauthenticated due to inactivity (timer DEAUTH/REMOVE) Tue Mar 11 16:50:23 2014 daemon.info hostapd: wlan2: STA 7c:d1:c3:6d:16:e6 IEEE 802.11: deauthenticated due to inactivity (timer DEAUTH/REMOVE)
I couldn't wait more, issued the 'wifi' command that restored the AP.
comment:125 Changed 4 years ago by pedro@…
I seem to be hitting this issue on 12.09 on a WDR4300. Adding a comment here as Trac account register seems to be broken and there doesn't seem to be a way to just subscribe to the bug report. Sorry for the spam.
comment:126 follow-up: ↓ 134 Changed 4 years ago by nbd
please try copying http://nbd.name/950-test3.patch to package/kernel/mac80211/patches, rebuild and test again.
comment:127 Changed 4 years ago by nbd
- Resolution set to no_response
- Status changed from reopened to closed
comment:128 Changed 4 years ago by anonymous
comment:129 Changed 4 years ago by anonymous
comment:130 Changed 4 years ago by thelists@…
Can confirm still broken for me; I just experienced this problem again on my TL-WDR4300 on BB r41391.
I only recently found this bug report, but have experienced this on the same hardware with an older revision of BB.
comment:131 Changed 4 years ago by nbd
Instead of saying "this problem", please describe the *exact* symptoms you're seeing. What you're experiencing is most likely a different bug, because you're using a chipset that is quite different from the one in the WRT160NL
comment:132 Changed 4 years ago by thelists@…
I arrived here searching for ar71xx (from my tl-wdr4300) and issues with wireless. I'm experiencing wireless-only connection issues (significant packet loss) after being up for some period of time greater than 24 - 48 hours. The period of time after which the problem manifests is not constant, and, a restart resolves the issue.
I apologize for any confusion, or that I may be posting in the wrong location. It just seemed like the same, or very similar, issue.
comment:133 Changed 4 years ago by jow
- Milestone changed from Attitude Adjustment 12.09 to Barrier Breaker 14.07
Milestone Attitude Adjustment 12.09 deleted
comment:134 in reply to: ↑ 126 ; follow-up: ↓ 135 Changed 3 years ago by dap@…
Replying to nbd:
please try copying http://nbd.name/950-test3.patch to package/kernel/mac80211/patches, rebuild and test again.
Sorry, I lost my interest, the workaround has worked well and got a little tired of this long standing bug.
Now I've some spare time, I'm testing r42516 right now;
everything is OK in the first few hours even under stress.
I'll report in a few days. I hope you can close this ticket as "fixed".
comment:135 in reply to: ↑ 134 Changed 3 years ago by dap@…
Replying to dap@…:
Replying to nbd:
please try copying http://nbd.name/950-test3.patch to package/kernel/mac80211/patches, rebuild and test again.
Now I've some spare time, I'm testing r42516 right now;
Stability is currently OK, but the performance issue I mentioned in comment:108 still there. It hitted the AP after about 24h uptime.
ANI stat under degraded performance when clients are idle:
ANIR OFDM CCKL SPUP SPDW OWD1 OWD0 MRC1 MRC0 FIRU FIRD INVL OERR CERR 14:22:17 0 7 2 1 1 0 0 0 0 1 1 0 2859 5 14:22:27 0 6 2 1 1 0 0 0 0 1 2 0 2760 3 14:22:37 0 6 2 1 1 0 0 0 0 1 1 0 3310 6 14:22:47 0 6 2 1 1 0 0 0 0 1 1 0 3028 10 14:22:57 0 7 2 1 1 0 0 0 0 1 0 0 3120 3 14:23:07 0 7 2 1 1 0 0 0 0 1 1 0 3021 18 14:23:17 0 6 2 0 0 1 1 0 0 1 2 0 2218 4 14:23:27 0 7 2 1 1 0 0 0 0 1 0 0 3234 6
ANI stat under degraded performance when a client downloading:
ANIR OFDM CCKL SPUP SPDW OWD1 OWD0 MRC1 MRC0 FIRU FIRD INVL OERR CERR 14:26:48 0 6 2 0 0 0 0 0 0 0 1 0 1666 58 14:26:58 0 7 2 2 2 0 0 0 0 2 1 0 1866 58 14:27:08 0 7 2 0 0 1 1 0 0 1 1 0 1903 55 14:27:18 0 7 2 0 0 3 3 0 0 3 3 0 1386 33 14:27:28 0 6 2 1 1 0 0 0 0 1 2 0 1721 45 14:27:38 0 7 2 2 2 0 0 0 0 2 1 0 2266 40 14:27:48 0 5 2 1 1 0 0 0 0 1 3 0 1755 41 14:27:58 0 5 2 2 2 0 0 0 0 2 2 0 1915 39
Here I issued the "wifi" command, normal performance has been restored immediately.
ANI stat when clients are idle:
ANIR OFDM CCKL SPUP SPDW OWD1 OWD0 MRC1 MRC0 FIRU FIRD INVL OERR CERR 14:39:10 0 8 2 0 0 0 0 0 0 1 1 0 2899 5 14:39:20 0 6 2 0 0 1 0 0 0 0 2 0 2899 4 14:39:30 0 9 2 1 1 0 1 0 0 3 0 0 3413 9 14:39:40 0 7 2 0 0 1 0 0 0 0 2 0 2708 15 14:39:50 0 9 2 0 0 0 1 0 0 2 0 0 3557 18 14:40:00 0 9 2 0 0 0 0 0 0 1 1 0 3190 7 14:40:10 0 7 2 0 0 1 0 0 0 0 2 0 2598 7 14:40:20 0 9 2 0 0 0 1 0 0 2 0 0 3099 43
ANI stat when a client downloading with high speed:
ANIR OFDM CCKL SPUP SPDW OWD1 OWD0 MRC1 MRC0 FIRU FIRD INVL OERR CERR 14:36:22 0 7 2 0 0 1 0 0 0 0 1 0 2897 13 14:36:32 0 7 2 0 0 1 1 0 0 1 1 0 2293 42 14:36:42 0 9 2 0 0 0 1 0 0 2 0 0 1766 92 14:36:52 0 9 2 0 0 0 0 0 0 0 0 0 1370 84 14:37:02 0 8 2 0 0 0 0 0 0 0 1 0 1550 88 14:37:12 0 9 2 0 0 0 0 0 0 1 0 0 1581 92 14:37:22 0 9 2 0 0 0 0 0 0 1 1 0 1287 96 14:37:32 0 9 2 0 0 0 0 0 0 0 0 0 1495 84
The performance issue is noticable on normal web browsing too.
Good to see the improvement on stability, but this performance issue is still a significant problem.
comment:136 Changed 3 years ago by anonymous
- Resolution no_response deleted
- Status changed from closed to reopened
I can't disable ani.
it says no directory found for;
/sys/kernel/debug/ieee80211/phy0/ath9k/ani
should i build openwrt from scratch with ath9k debugging on?
comment:137 Changed 3 years ago by italovalcy@…
Same problem with WR842NDv2 (chipset Atheros AR9341). I'm using openwrt BB kernel 3.10.49) and wifi in Ad-hoc mode. The problem is about transmission erros:
prompt# iwconfig wlan0
wlan0 IEEE 802.11bgn ESSID:"foobar"
Mode:Ad-Hoc Frequency:2.412 GHz Cell: 06:DA:85:6E:97:4B
Tx-Power=18 dBm
RTS thr:off Fragment thr:off
Encryption key:off
Power Management:off
prompt# iw dev wlan0 station dump
Station a0:f3:c1:0e:16:c3 (on wlan0)
inactive time: 50 ms
rx bytes: 548427
rx packets: 12661
tx bytes: 45830
tx packets: 455
tx retries: 134
tx failed: 363
signal: -44 [-46, -49] dBm
signal avg: -44 [-46, -48] dBm
tx bitrate: 24.0 MBit/s
rx bitrate: 54.0 MBit/s
authorized: yes
authenticated: yes
preamble: long
WMM/WME: yes
MFP: no
TDLS peer: no
If I run a tcpdump on that interface, I can see the kernel answer but the packet is not transmitted as I cannot see it on the other device. I have tried the workround (echo 0 > /sys/kernel/debug/ieee80211/phy0/ath9k/ani), but it didn't work for me.
Do you have any other idea?
Thanks.
comment:138 Changed 3 years ago by italovalcy@…
Hello everyone,
I've tried with the recent kernel (3.18.7) from openwrt trunk, and to packet loss was solved! I build the image from openwrt doc [1].
comment:139 Changed 3 years ago by TaiSHi
I tried 3.18.7 and it failed after 36 hours. Initially speed dropped to 1mb/s and then failed completely.
comment:140 Changed 3 years ago by TaiSHi
Also, I can't seem to be able to disable ANI, it returns permission denied when trying to output echo to the file
comment:141 follow-up: ↓ 142 Changed 3 years ago by nbd
please try r44696
comment:142 in reply to: ↑ 141 Changed 3 years ago by dap@…
comment:143 Changed 3 years ago by dap@…
The performance problem is still there.
ANI stat under degraded performance:
ANIR OFDM CCKL SPUP SPDW OWD1 OWD0 MRC1 MRC0 FIRU FIRD INVL OERR CERR 20:22:10 0 5 2 2 2 0 0 0 0 2 1 0 1429 89 20:22:20 0 4 2 1 1 0 0 0 0 1 2 0 1334 89 20:22:30 0 5 2 3 3 0 0 0 0 3 2 0 1527 111 20:22:40 0 5 2 1 1 0 0 0 0 1 1 0 1450 98 20:22:50 0 4 2 1 1 0 0 0 0 1 2 0 1536 116 20:23:00 0 4 2 0 0 0 0 0 0 0 0 0 1792 142 20:23:10 0 5 2 1 1 0 0 0 0 1 0 0 1559 124 20:23:20 0 4 2 0 0 0 0 0 0 0 1 0 1568 98 20:23:30 0 5 2 1 1 0 0 0 0 1 0 0 1585 127 20:23:40 0 4 2 2 2 0 0 0 0 2 3 0 1463 118
The "wifi" command restored normal AP performance. ANI stat under downloading:
ANIR OFDM CCKL SPUP SPDW OWD1 OWD0 MRC1 MRC0 FIRU FIRD INVL OERR CERR 20:28:12 0 5 2 0 0 0 0 0 0 0 0 0 586 161 20:28:22 0 5 2 0 0 0 0 0 0 0 0 0 691 148 20:28:32 0 5 2 0 0 0 0 0 0 0 0 0 674 158 20:28:42 0 5 2 0 0 0 0 0 0 0 0 0 660 143 20:28:52 0 5 2 1 1 0 0 0 0 1 1 0 915 123 20:29:02 0 5 2 0 0 0 0 0 0 0 0 0 670 156 20:29:12 0 5 2 1 1 0 0 0 0 1 1 0 791 140 20:29:22 0 6 2 1 1 0 0 0 0 1 0 0 1044 155 20:29:32 0 6 2 1 1 0 0 0 0 1 1 0 702 137
ANI stats are very similar for me. Should I post it or does not help anymore?
Now I testing with disabled ANI to be sure this is still an ANI-related issue.
comment:144 Changed 3 years ago by dap@…
I've serious performance issues on r44696 without ANI too. I did not find a stable and high-performance setup yet, I have no idea what's going on but it's definitely not (just) ANI issue anymore.
comment:145 Changed 3 years ago by dap@…
comment:146 Changed 3 years ago by dap@…
I'm using r44654 from 6 weeks ago and seems like this issue has been resolved in this version. I did not find what fixed this problem exactly, but sure it happened somewhere between r42516 and r44654.
I still have some problems but that is an another issue, another ticket. This one is resolved in r44654. Thank you!
comment:147 Changed 3 years ago by nbd
- Resolution set to fixed
- Status changed from reopened to closed
comment:148 follow-up: ↓ 149 Changed 3 years ago by taishi@…
comment:149 in reply to: ↑ 148 Changed 3 years ago by dap@…
comment:150 follow-up: ↓ 152 Changed 3 years ago by taishi@…
comment:151 Changed 3 years ago by taishi@…
r44654 failed after <20 hours with ANI enabled. Now did a factory reset and disabled ANI, let's see how it goes.
comment:152 in reply to: ↑ 150 ; follow-up: ↓ 153 Changed 3 years ago by dap@…
Replying to taishi@…:
r45884 failed after 48 hours or so.
Damn, you're right. The breakdown is much rare than before, maybe something changed in my enviroment, but yes, it's still there. :(
Last time the following command fixed it instantly, did not have to restart anything:
echo 0 >/sys/kernel/debug/ieee80211/phy0/ath9k/ani
comment:153 in reply to: ↑ 152 Changed 3 years ago by taishi@…
Replying to dap@…:
Last time the following command fixed it instantly, did not have to restart anything:
echo 0 >/sys/kernel/debug/ieee80211/phy0/ath9k/ani
That's pretty detriment to performance, although the only way to keep it up. This issue seems to affect just a small number of people, even with same hardware, could it be something hw related?
comment:154 follow-up: ↓ 162 Changed 2 years ago by dap@…
- Resolution fixed deleted
- Status changed from closed to reopened
It is not fixed, still an issue on r45743.
Additional problems summarized in #comment:117 are actual too, disabling ANI does not affect those:
- "AP disaster" issue, symptom: all clients are disconnected and impossible to connect to the AP (the SSID is there).
- "downloader stalling" issue, symptom: I have to reconnect. Other clients may not be affected.
Although some hours of heavy stressing needed to reproduce one of those randomly. Are there open tickets which may be about them?
Due to WRT160NL stability issues I recently bought a WRT1200AC. My WRT160NL being idle if something worth a try. The downloader stalling issue might be client bug, I will use the new AP to boil it down.
comment:155 Changed 2 years ago by anonymous
I am running r46724 on a TP-LINK TL-WR703N and the exact issues described on comment:154 are still present. Is there a list of ath9k devices that are NOT affected by this issue?
comment:156 Changed 2 years ago by taishi@…
I'm not exactly sure if this affects the entire ath9k, router models or just SOME routers (as I have people with my own router -not surve if same rev- working flawlessly).
@anonymous on comment:155 tested a WDR4300 for a customer (can't recall rev, but I think they're all 1.0 :P) and worked beyond perfect
/cheers
comment:157 follow-up: ↓ 158 Changed 2 years ago by anonymoux
I'm asking because I have a TP-LINK TL-WR841ND and a D-Link DIR-615 that both have the exact same problem and they are all ath9k devices. It works fine on stock firmware so I know it's not a hardware or interference issue. This seems like a regression since I remember it working flawlessly a few years back.
comment:158 in reply to: ↑ 157 Changed 2 years ago by dap@…
Replying to anonymoux:
I'm asking because I have a TP-LINK TL-WR841ND and a D-Link DIR-615 that both have the exact same problem and they are all ath9k devices. It works fine on stock firmware so I know it's not a hardware or interference issue. This seems like a regression since I remember it working flawlessly a few years back.
As I can rembember my ath9k was rock stable with Backfire, troubles came with AA. I had plans to join issues to a specific revision started from Backfire but ath9k was broken&fixed so many times from that point that I'm afraid it could not help in the current issues.
I did not find a flawlessly working release in the past years also.
comment:159 Changed 2 years ago by dap@…
I reviewed all comments, affected wifi chipsets probably are:
- AR9102
- AR9103
Reports with different problems are sorted out. Many of you did not report hardware revision but every reported router has this problem has a version with the chipsets above.
My WRT160NL traded for a TL-WDR4300 yesterday. I stress this new toy with heavy traffic+noise but it's stable for the time being. The wifi chip is a AR9340.
Anybody has this issue with a different wifi chip?
comment:160 Changed 2 years ago by taishi@…
My 1043ND has AR9103, but I have reports that same router performs fine.
I'll be getting a Tl-WDRR4300 in a few weeks, although I've instaled one of those and has >60 day uptime with no hiccups
comment:161 Changed 2 years ago by anonymous
I seem to be having similar issues on Mikrotik rb493g with R52Hn miniPCI card with ar9220 installed.
comment:162 in reply to: ↑ 154 Changed 2 years ago by anonymous
Replying to dap@…:
It is not fixed, still an issue on r45743.
Additional problems summarized in #comment:117 are actual too, disabling ANI does not affect those:
- "AP disaster" issue, symptom: all clients are disconnected and impossible to connect to the AP (the SSID is there).
- "downloader stalling" issue, symptom: I have to reconnect. Other clients may not be affected.
Although some hours of heavy stressing needed to reproduce one of those randomly. Are there open tickets which may be about them?
See also /ticket/11862.html.
comment:163 in reply to: ↑ description Changed 2 years ago by anonymous
Replying to dap@…:
I began to monitor some wifi parameters with munin recently, I found the following on the WRT AP when this problem kicks in:
Can you attach or pastebin these plugins? Hopefully you're still monitoring this ticket.
comment:164 Changed 20 months ago by Misiek
Chaos Calmer 15.05.1 / LuCI 15.05-149-g0d8bbd2 Release (git-15.363.78009-956be55)
TP-Link WR841N v8
Still have problem with this. Disabling disassoc_low_ack partially helped, but still happens sometimes (after few hours). I can connect with notebook on 2-3m distances, but phones have problems.

My box, TP-Link TL-WR1043ND, possibly suffers from the same problem.
Connected clients remain connected, rx bitrates drop too.
Another thing happens over here which is not on your list: new clients are unable to connect!
Tested two smartphones, a laptop with Broadcom chip using Windows and another laptop using Atheros 9k chip and Linux (Debian Wheezy).
Solved the problem by restarting the wireless. (wifi down; wifi up).
Not at home, so providing dumps/logs I can't do now.
But I can provide what you ask of me when I get there.