Modify

Opened 4 years ago

Closed 3 years ago

#17354 closed defect (worksforme)

ath9k packet loss with wds

Reported by: dphi <philipp@…> Owned by: developers
Priority: normal Milestone: Barrier Breaker 14.07
Component: kernel Version: Barrier Breaker 14.07
Keywords: Cc:

Description

I have two ath9k based access points, connected via WDS. One acts as the gateway, the other as a repeater.

Wifi connections to the repeater become unusable after some time and dhcp fails.

Symptoms besides the slowdown:

  • dhcp does not work anymore, but clients stay associated and can associate
  • Kernel: "ath: phy0: Failed to stop TX DMA, queues=..."
  • ping loss of wifi clients connected to the repeater
    • to repeater 47% loss
    • to gateway 68% loss (via the repeater)

Both access points run openwrt 14.07-rc2. Executing "wifi" on the repeater solves the issue for a while.

gateway:
 system type		: Atheros AR9344 rev 2
 machine		: TP-LINK TL-WDR3600/4300/4310
repeater:
 system type		: Atheros AR9341 rev 3
 machine		: TP-LINK TL-WR841N/ND v8

The wireless configuration of the repeater:

$ cat /etc/config/wireless
config wifi-device 'radio0'
	option type 'mac80211'
	option macaddr '10:fe:ed:8a:2a:50'
	option hwmode '11ng'
	list ht_capab 'LDPC'
	list ht_capab 'SHORT-GI-20'
	list ht_capab 'SHORT-GI-40'
	list ht_capab 'TX-STBC'
	list ht_capab 'RX-STBC1'
	list ht_capab 'DSSS_CCK-40'
	option disabled '0'
	option country 'DE'
	option htmode 'HT40'
	option txpower '20'
	option channel '6'

config wifi-iface
	option device 'radio0'
	option ssid 'SSID'
	option mode 'sta'
	option wds '1'
	option bssid '64:70:02:B5:D5:FD'
	option network 'lan'
	option encryption 'psk2'
	option key 'KEY'

config wifi-iface
	option device 'radio0'
	option mode 'ap'
	option ssid 'SSID'
	option wds '1'
	option network 'lan'
	option encryption 'psk2'
	option key 'KEY'

Thanks!

Attachments (0)

Change History (11)

comment:1 Changed 3 years ago by anonymous

That could be a regression. I have seen something similiar on ubnt bullet M2 in adhoc mode approximately a year ago (snapshots). I never tracked that down. It was gone some builds later.
The symptoms: a good and stable connection with a better looking snr, than it could be. was working fine for 5 minutes to 60 minutes. suddenly an increase of ping times followed by packet losses.
Every five minutes it got better (almost normal) for 10 pings or so and then starting all over.
Executing wifi solved the problem for the next run.

In the special setup it was master+adhoc, but it also occured on adhoc only nodes.
Furthermore I did a reset of /etc/config/wireless back to defaults since I had basic_rates and such configured on that specific device. It helped a bit.
I also fiddled with ani, disabling it via /sys/kernel/debug helped a bit against the packet loss, but the latencies stayed.

comment:2 Changed 3 years ago by nbd

please try rc3

comment:3 Changed 3 years ago by nbd

  • Resolution set to no_response
  • Status changed from new to closed

comment:4 Changed 3 years ago by nsgend@…

  • Resolution no_response deleted
  • Status changed from closed to reopened

Having the same problem in 14.07-rc3.
Two TP-Link WDR4300 (ath9k) routers in WDS at 2.4Ghz (ap+client), ICMP/UDP rather fine, then suddenly massive packet loss and high latency. TCP is also consistently flaky, lots of TCP retransmissions and ACKs not being received by the other side.

comment:5 Changed 3 years ago by nbd

please test BB final

comment:6 Changed 3 years ago by jannie@…

I seem to have a similar issue, 14.07 final r43209. My setup differs in that my main WDS base is a UBNT Bullet2 Backfire 10.03 Snapshot r33081 using madwifi. The WDS base and its existing UBNT Bullet2 Backfire 10.03 r33081 madwifi repeater have been - and continues to be - stable.

Two days ago, I added a UBNT NSM2 running 14.07 final r43209 using ath9k. Things work great for a while (seems to be an arbitrary number of hours, even if no non-WDS sessions have been associated.) I then start seeing high latencies and packet loss. At the moment, it seems only traffic FROM the ath9k repeater TO the madwifi base is affected. The rate of packet loss varies tremendously - sometimes it's 1%, other times it's 80%. Traffic from the madwifi base to the ath9k repeater appears to be unaffected.

The madwifi base still works perfectly with the madwifi repeater during this time. <1% packet loss, no latency issues.

In most cases, running 'wifi' on the ath9k repeater is enough to restore the connection to normal state of virtually 0% packet loss. One case I've had required doing so on the WDS base (even though the other Backfire madwifi repeater was unaffected at the time.

I've been unable to find a 'trigger' pattern thus far.

How I'm testing throughput and packet loss:

Problem State
root@ath9krepeater:~# iperf -w 64k -c 192.168.18.10 -u -b 1m -r


Client connecting to 192.168.18.10, UDP port 5001
Sending 1470 byte datagrams
UDP buffer size: 128 KByte (WARNING: requested 64.0 KByte)


[ 3] local 192.168.18.15 port 33571 connected with 192.168.18.10 port 5001
[ ID] Interval Transfer Bandwidth
[ 3] 0.0-10.8 sec 715 KBytes 543 Kbits/sec
[ 3] Sent 498 datagrams
[ 3] Server Report:
[ 3] 0.0-11.1 sec 161 KBytes 119 Kbits/sec 122.994 ms 387/ 499 (78%)


Server listening on UDP port 5001
Receiving 1470 byte datagrams
UDP buffer size: 128 KByte (WARNING: requested 64.0 KByte)


[ 3] local 192.168.18.15 port 5001 connected with 192.168.18.10 port 56979
[ 3] 0.0-10.0 sec 1.19 MBytes 999 Kbits/sec 0.883 ms 1/ 853 (0.12%)
root@Bloudakhuis-NSM2:~#

Normal state, after running 'wifi', immediately following above
root@ath9krepeater:~# iperf -w 64k -c 192.168.18.10 -u -b 1m -r


Client connecting to 192.168.18.10, UDP port 5001
Sending 1470 byte datagrams
UDP buffer size: 128 KByte (WARNING: requested 64.0 KByte)


[ 3] local 192.168.18.15 port 42761 connected with 192.168.18.10 port 5001
[ ID] Interval Transfer Bandwidth
[ 3] 0.0-10.0 sec 1.19 MBytes 1000 Kbits/sec
[ 3] Sent 852 datagrams
[ 3] Server Report:
[ 3] 0.0-10.0 sec 1.19 MBytes 1.00 Mbits/sec 0.462 ms 1/ 853 (0.12%)


Server listening on UDP port 5001
Receiving 1470 byte datagrams
UDP buffer size: 128 KByte (WARNING: requested 64.0 KByte)


[ 3] local 192.168.18.15 port 5001 connected with 192.168.18.10 port 52210
[ 3] 0.0-10.0 sec 1.19 MBytes 1000 Kbits/sec 0.972 ms 1/ 852 (0.12%)
root@Bloudakhuis-NSM2:~#

comment:7 Changed 3 years ago by jannie@…

Subsequent to my experience above, I've removed the basic_rate and mcast_rate settings I had and have not seen the symptom again.

I'll try to add them back at a later time and see if the problem returns.

comment:8 Changed 3 years ago by hlarsen

i'm seeing the same issue on two TP-Link Archer C7 v2's, both running BB 14.07 in an ap/sta wds config. after an indeterminate amount of time (a few packet loss between the routers starts to wildly fluctuate from ~20% to ~80%, making wifi unusable. a laptop connecting to the main ap while this happens also shows very high packet loss.

running 'wifi' on the ap fixes the issue for a while, then the issue seems to creep back and requires another running of 'wifi'. this is a new setup so i'm still troubleshooting, but it seems if the routers are rebooted it takes a day or two for the problem to appear, however running 'wifi' may only fix the problem for a few hours. i'll report back.

if anyone has suggestions for further troubleshooting or things to try, i'm all ears. i couldn't get a bridge going with the 5ghz radios, but i'll try that again in hopes that it works better.

as a side note, i'm seeing a ton of this error on the ap, though throughput is fine and it seems tangentially related if anything (ticket 11862):

ath: phy1: Failed to stop TX DMA, queues=0x004!

comment:9 Changed 3 years ago by nbd

please try current trunk

comment:10 Changed 3 years ago by hlarsen

edit: just guessing here, but my comment is probably useless as the bridge is now using the 5ghz radio, and hence ath10k rather than ath9k.

---
things seem to be a lot better (so far) with CC. i have a WDS bridge going with two Archer C7 v2's on CC r45027 using the 5ghz radios; had to install kmod-ath10k to get them to show up. it's been a few days and i have not yet seen the issue that causes the packet loss, but i will update if it happens.

i'm getting a ton of these on the STA, but AFAIK it's not causing any (major) issues:

br-lan: received packet on wlan0 with own address as source address

per some forum posts i've tried enabling STP on the LAN interfaces of the AP and STA, but no dice getting rid of the error message. i'll post something in the forums with my setup asking for comments/improvements as it seems unrelated to this issue.

Last edited 3 years ago by hlarsen (previous) (diff)

comment:11 Changed 3 years ago by nbd

  • Resolution set to worksforme
  • Status changed from reopened to closed

Add Comment

Modify Ticket

Action
as closed .
The resolution will be deleted. Next status will be 'reopened'.
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.