Modify

Opened 5 years ago

Closed 5 years ago

Last modified 4 years ago

#12919 closed defect (fixed)

Unstable wifi between two TL-WDR4320

Reported by: tusker@… Owned by: developers
Priority: normal Milestone: Barrier Breaker 14.07
Component: packages Version: Trunk
Keywords: Cc:

Description

1) Network setup
AP <- wifi -> client/bridge <- wired -> switch <- laptops/desktops
2) Compiled from source: attitude_adjustment-35153, using gargoyle patch: 969-allow-sta-bridge.patch
3) AP and client/bridge: TL-WDR320 V1, AR9344 with AR9340 + AR9580 dual radio

With debug enabled on both AP and client/bridge, I see the following messages many times:

client/bridge:
[42152.680000] wlan0: Open BA session requested for 20:dc:e6:fc:d3:1f tid 0
[42152.680000] wlan0: BA request denied - HW unavailable for tid 0
[42153.010000] wlan0: Open BA session requested for 20:dc:e6:fc:d3:1f tid 0
[42153.010000] wlan0: BA request denied - HW unavailable for tid 0
[42153.160000] wlan0: Open BA session requested for 20:dc:e6:fc:d3:1f tid 0
[42153.160000] wlan0: BA request denied - HW unavailable for tid 0
[42153.270000] wlan0: Open BA session requested for 20:dc:e6:fc:d3:1f tid 0
[42153.270000] wlan0: BA request denied - HW unavailable for tid 0
[42153.440000] wlan0: Open BA session requested for 20:dc:e6:fc:d3:1f tid 0
[42153.440000] wlan0: BA request denied - HW unavailable for tid 0
[42153.500000] wlan0: tx session timer expired on tid 3
[42153.500000] wlan0: Tx BA session stop requested for 20:dc:e6:fc:d3:1f tid 3
[42153.500000] wlan0: Stopping Tx BA session for 20:dc:e6:fc:d3:1f tid 3
[42153.680000] wlan0: Open BA session requested for 20:dc:e6:fc:d3:1f tid 0
[42153.680000] wlan0: BA request denied - HW unavailable for tid 0
[42153.880000] wlan0: Open BA session requested for 20:dc:e6:fc:d3:1f tid 0

AP:

[37827.150000] wlan0: tx session timer expired on tid 0
[37827.150000] wlan0: Tx BA session stop requested for b4:99:ba:dd:3c:ad tid 0
[37827.150000] wlan0: Stopping Tx BA session for b4:99:ba:dd:3c:ad tid 0
[37832.960000] wlan0: Open BA session requested for f8:3d:ff:59:0f:38 tid 0
[37832.960000] wlan0: activated addBA response timer on tid 0
[37833.170000] wlan0: switched off addBA timer for tid 0
[37833.170000] wlan0: Aggregation is on for tid 0
[37843.210000] wlan0: tx session timer expired on tid 0
[37843.210000] wlan0: Tx BA session stop requested for f8:3d:ff:59:0f:38 tid 0
[37843.210000] wlan0: Stopping Tx BA session for f8:3d:ff:59:0f:38 tid 0
[37872.670000] wlan0: Open BA session requested for f8:3d:ff:59:0f:38 tid 0
[37872.670000] wlan0: activated addBA response timer on tid 0
[37872.980000] wlan0: switched off addBA timer for tid 0
[37872.980000] wlan0: Aggregation is on for tid 0
[37882.710000] wlan0: tx session timer expired on tid 0
[37882.710000] wlan0: Tx BA session stop requested for f8:3d:ff:59:0f:38 tid 0
[37882.710000] wlan0: Stopping Tx BA session for f8:3d:ff:59:0f:38 tid 0
[37888.240000] wlan0: Rx BA session stop requested for b4:99:ba:dd:3c:ad tid 0 recipient reason: 36
[37890.830000] wlan0: Rx A-MPDU request on tid 3 result 0
[37895.950000] wlan0: delba from 20:dc:e6:fc:d2:c2 (initiator) tid 3 reason code 37
[37895.950000] wlan0: Rx BA session stop requested for 20:dc:e6:fc:d2:c2 tid 3 inititator reason: 0
[37920.650000] wlan0: Rx A-MPDU request on tid 3 result 0
[37924.220000] wlan0: Rx A-MPDU request on tid 0 result 0
[37925.780000] wlan0: rx session timer expired on tid 3
[37925.780000] wlan0: Rx BA session stop requested for 20:dc:e6:fc:d2:c2 tid 3 recipient reason: 39
[37959.950000] wlan0: Rx BA session stop requested for b4:99:ba:dd:3c:ad tid 0 recipient reason: 36
[37966.200000] wlan0: Rx A-MPDU request on tid 3 result 0
[37971.330000] wlan0: delba from 20:dc:e6:fc:d2:c2 (initiator) tid 3 reason code 37
[37971.330000] wlan0: Rx BA session stop requested for 20:dc:e6:fc:d2:c2 tid 3 inititator reason: 0

Attachments (5)

ap.wireless (1.1 KB) - added by tusker@… 5 years ago.
client.wireless (821 bytes) - added by tusker@… 5 years ago.
tpone.log (158.4 KB) - added by tusker@… 5 years ago.
Additional log during instability
trace.dat.gz (237.1 KB) - added by tusker@… 5 years ago.
trace.dat.2.gz (155.3 KB) - added by tusker@… 5 years ago.

Download all attachments as: .zip

Change History (41)

Changed 5 years ago by tusker@…

Changed 5 years ago by tusker@…

comment:1 Changed 5 years ago by tusker@…

From a wired client, showing low pings across the link, but with high packet loss

64 bytes from 10.0.0.1: icmp_req=75 ttl=64 time=2.78 ms
64 bytes from 10.0.0.1: icmp_req=76 ttl=64 time=1.09 ms
64 bytes from 10.0.0.1: icmp_req=77 ttl=64 time=1.27 ms
64 bytes from 10.0.0.1: icmp_req=78 ttl=64 time=1.09 ms
64 bytes from 10.0.0.1: icmp_req=79 ttl=64 time=1.18 ms
64 bytes from 10.0.0.1: icmp_req=81 ttl=64 time=1.39 ms
64 bytes from 10.0.0.1: icmp_req=82 ttl=64 time=1.33 ms
64 bytes from 10.0.0.1: icmp_req=83 ttl=64 time=3.81 ms
64 bytes from 10.0.0.1: icmp_req=84 ttl=64 time=3.85 ms
64 bytes from 10.0.0.1: icmp_req=85 ttl=64 time=1.15 ms
64 bytes from 10.0.0.1: icmp_req=86 ttl=64 time=1.33 ms
64 bytes from 10.0.0.1: icmp_req=87 ttl=64 time=1.13 ms
64 bytes from 10.0.0.1: icmp_req=88 ttl=64 time=1.19 ms
64 bytes from 10.0.0.1: icmp_req=90 ttl=64 time=2.83 ms
64 bytes from 10.0.0.1: icmp_req=91 ttl=64 time=3.60 ms
64 bytes from 10.0.0.1: icmp_req=92 ttl=64 time=5.35 ms
C
--- 10.0.0.1 ping statistics ---
92 packets transmitted, 81 received, 11% packet loss, time 45750ms
rtt min/avg/max/mdev = 0.798/1.815/8.540/1.325 ms

comment:2 Changed 5 years ago by tusker@…

client bridge MAC addresses:
wlan0 Link encap:Ethernet HWaddr 20:DC:E6:FC:D2:C2
AP MAC addresses:
wlan0 Link encap:Ethernet HWaddr 20:DC:E6:FC:D3:1F
wlan1 Link encap:Ethernet HWaddr 20:DC:E6:FC:D3:20

When using Skype on the wired client laptop, the connection between the AP and wireless client drops:

client/bridge:

[46042.220000] wlan0: deauthenticated from 20:dc:e6:fc:d3:1f (Reason: 2)
[46042.220000] br-lan: port 3(wlan0) entered disabled state
[46043.000000] wlan0: authenticate with 20:dc:e6:fc:d3:1f
[46043.020000] wlan0: send auth to 20:dc:e6:fc:d3:1f (try 1/3)
[46043.030000] wlan0: authenticated
[46043.040000] wlan0: associate with 20:dc:e6:fc:d3:1f (try 1/3)
[46043.040000] wlan0: RX AssocResp from 20:dc:e6:fc:d3:1f (capab=0x411 status=0 aid=1)
[46043.050000] wlan0: associated
[46043.060000] br-lan: port 3(wlan0) entered forwarding state
[46043.060000] br-lan: port 3(wlan0) entered forwarding state
[46043.370000] wlan0: dropped frame to 20:dc:e6:fc:d3:1f (unauthorized port)
[46043.640000] wlan0: dropped frame to 20:dc:e6:fc:d3:1f (unauthorized port)
[46043.720000] wlan0: dropped frame to 20:dc:e6:fc:d3:1f (unauthorized port)
[46043.790000] wlan0: dropped frame to 20:dc:e6:fc:d3:1f (unauthorized port)
[46043.870000] wlan0: dropped frame to 20:dc:e6:fc:d3:1f (unauthorized port)
[46043.990000] wlan0: dropped frame to 20:dc:e6:fc:d3:1f (unauthorized port)
[46044.020000] wlan0: dropped frame to 20:dc:e6:fc:d3:1f (unauthorized port)
[46044.380000] wlan0: Open BA session requested for 20:dc:e6:fc:d3:1f tid 0
[46044.380000] wlan0: activated addBA response timer on tid 0
[46044.380000] wlan0: switched off addBA timer for tid 0
[46044.380000] wlan0: Aggregation is on for tid 0
[46044.470000] wlan0: Rx A-MPDU request on tid 0 result 0
[46045.060000] br-lan: port 3(wlan0) entered forwarding state

AP:
Jan 30 11:07:53 tpone kern.debug kernel: [41716.640000] wlan0: Rx BA session stop requested for 20:dc:e6:fc:d2:c2 tid 3 recipient reason: 39
Jan 30 11:07:55 tpone kern.debug kernel: [41718.080000] wlan0: tx session timer expired on tid 0
Jan 30 11:07:55 tpone kern.debug kernel: [41718.080000] wlan0: Tx BA session stop requested for f8:3d:ff:59:0f:38 tid 0
Jan 30 11:07:55 tpone kern.debug kernel: [41718.080000] wlan0: Stopping Tx BA session for f8:3d:ff:59:0f:38 tid 0
Jan 30 11:07:59 tpone kern.debug kernel: [41722.560000] wlan0: Open BA session requested for b4:99:ba:dd:3c:ad tid 0
Jan 30 11:07:59 tpone kern.debug kernel: [41722.560000] wlan0: activated addBA response timer on tid 0
Jan 30 11:07:59 tpone kern.debug kernel: [41722.560000] wlan0: switched off addBA timer for tid 0
Jan 30 11:07:59 tpone kern.debug kernel: [41722.560000] wlan0: Aggregation is on for tid 0
Jan 30 11:08:04 tpone kern.debug kernel: [41727.700000] wlan0: tx session timer expired on tid 0
Jan 30 11:08:04 tpone kern.debug kernel: [41727.700000] wlan0: Tx BA session stop requested for b4:99:ba:dd:3c:ad tid 0
Jan 30 11:08:04 tpone kern.debug kernel: [41727.700000] wlan0: Stopping Tx BA session for b4:99:ba:dd:3c:ad tid 0
Jan 30 11:08:31 tpone daemon.info hostapd: wlan0: STA 20:dc:e6:fc:d2:c2 IEEE 802.11: disconnected due to excessive missing ACKs
Jan 30 11:08:31 tpone kern.debug kernel: [41754.670000] wlan0: Tx BA session stop requested for 20:dc:e6:fc:d2:c2 tid 0
Jan 30 11:08:31 tpone kern.debug kernel: [41754.670000] wlan0: Stopping Tx BA session for 20:dc:e6:fc:d2:c2 tid 0
Jan 30 11:08:31 tpone kern.debug kernel: [41754.670000] wlan0: Could not find station: 20:dc:e6:fc:d2:c2

comment:3 Changed 5 years ago by tusker@…

Further logs:

Jan 30 11:33:16 tptwo kern.debug kernel: [47496.830000] wlan0: Rx BA session stop requested for 20:dc:e6:fc:d3:1f tid 0 recipient reason:
39
Jan 30 11:33:16 tptwo kern.debug kernel: [47496.840000] wlan0: addBA response timer expired on tid 3
Jan 30 11:33:16 tptwo kern.debug kernel: [47496.840000] wlan0: Tx BA session stop requested for 20:dc:e6:fc:d3:1f tid 3
Jan 30 11:33:16 tptwo kern.debug kernel: [47496.840000] wlan0: Stopping Tx BA session for 20:dc:e6:fc:d3:1f tid 3
Jan 30 11:33:19 tptwo kern.debug kernel: [47499.840000] wlan0: Open BA session requested for 20:dc:e6:fc:d3:1f tid 3
Jan 30 11:33:19 tptwo kern.debug kernel: [47499.840000] wlan0: activated addBA response timer on tid 3
Jan 30 11:33:20 tptwo kern.debug kernel: [47500.840000] wlan0: addBA response timer expired on tid 3
Jan 30 11:33:20 tptwo kern.debug kernel: [47500.840000] wlan0: Tx BA session stop requested for 20:dc:e6:fc:d3:1f tid 3
Jan 30 11:33:20 tptwo kern.debug kernel: [47500.840000] wlan0: Stopping Tx BA session for 20:dc:e6:fc:d3:1f tid 3
Jan 30 11:33:27 tptwo kern.debug kernel: [47507.840000] wlan0: Open BA session requested for 20:dc:e6:fc:d3:1f tid 3
Jan 30 11:33:27 tptwo kern.debug kernel: [47507.840000] wlan0: BA request denied - waiting a grace period after 4 failed requests on tid
Jan 30 11:33:27 tptwo kern.debug kernel: [47508.090000] wlan0: Open BA session requested for 20:dc:e6:fc:d3:1f tid 3
Jan 30 11:33:27 tptwo kern.debug kernel: [47508.090000] wlan0: BA request denied - waiting a grace period after 4 failed requests on tid
Jan 30 11:33:42 tptwo kern.info kernel: [47522.700000] wlan0: deauthenticated from 20:dc:e6:fc:d3:1f (Reason: 2)
Jan 30 11:33:42 tptwo kern.debug kernel: [47522.710000] wlan0: Tx BA session stop requested for 20:dc:e6:fc:d3:1f tid 0
Jan 30 11:33:42 tptwo kern.info kernel: [47522.710000] br-lan: port 3(wlan0) entered disabled state
Jan 30 11:33:42 tptwo kern.debug kernel: [47522.730000] wlan0: Stopping Tx BA session for 20:dc:e6:fc:d3:1f tid 0
Jan 30 11:33:42 tptwo kern.debug kernel: [47522.730000] wlan0: Could not find station: 20:dc:e6:fc:d3:1f
Jan 30 11:33:42 tptwo kern.info kernel: [47523.490000] wlan0: authenticate with 20:dc:e6:fc:d3:1f
Jan 30 11:33:42 tptwo kern.info kernel: [47523.510000] wlan0: send auth to 20:dc:e6:fc:d3:1f (try 1/3)
Jan 30 11:33:43 tptwo kern.info kernel: [47523.520000] wlan0: authenticated
Jan 30 11:33:43 tptwo kern.info kernel: [47523.530000] wlan0: associate with 20:dc:e6:fc:d3:1f (try 1/3)
Jan 30 11:33:43 tptwo kern.info kernel: [47523.530000] wlan0: RX AssocResp from 20:dc:e6:fc:d3:1f (capab=0x411 status=0 aid=1)
Jan 30 11:33:43 tptwo kern.info kernel: [47523.540000] wlan0: associated
Jan 30 11:33:43 tptwo kern.info kernel: [47523.550000] br-lan: port 3(wlan0) entered forwarding state
Jan 30 11:33:43 tptwo kern.info kernel: [47523.550000] br-lan: port 3(wlan0) entered forwarding state

Changed 5 years ago by tusker@…

Additional log during instability

comment:4 Changed 5 years ago by tusker@…

I reverted the firmware, and it is much more stable on r34379

comment:5 Changed 5 years ago by tusker@…

Trying r35706 from trunk is even less stable, with many invalid packets and the ping between the two routers is very unstable.

On AP (tpone/10.0.0.1):
Feb 22 09:30:18 tpone kern.err kernel: [44296.140000] ath: phy0: Failed to stop TX DMA, queues=0x004!

On Client (tptwo/10.0.0.8):
wlan0 IEEE 802.11bgn ESSID:"mascordone"

Mode:Managed Frequency:2.432 GHz Access Point: 20:DC:E6:FC:D3:1F
Bit Rate=52 Mb/s Tx-Power=20 dBm
RTS thr:off Fragment thr:off
Encryption key:off
Power Management:off
Link Quality=42/70 Signal level=-68 dBm
Rx invalid nwid:0 Rx invalid crypt:0 Rx invalid frag:0
Tx excessive retries:623 Invalid misc:54356 Missed beacon:0

PING 10.0.0.1 (10.0.0.1): 56 data bytes
64 bytes from 10.0.0.1: seq=0 ttl=64 time=5.037 ms
64 bytes from 10.0.0.1: seq=1 ttl=64 time=1.559 ms
64 bytes from 10.0.0.1: seq=2 ttl=64 time=1.799 ms
64 bytes from 10.0.0.1: seq=3 ttl=64 time=4.829 ms
64 bytes from 10.0.0.1: seq=4 ttl=64 time=9.761 ms
64 bytes from 10.0.0.1: seq=5 ttl=64 time=1.167 ms
64 bytes from 10.0.0.1: seq=6 ttl=64 time=1.218 ms
64 bytes from 10.0.0.1: seq=7 ttl=64 time=1.325 ms
64 bytes from 10.0.0.1: seq=8 ttl=64 time=1.356 ms
64 bytes from 10.0.0.1: seq=9 ttl=64 time=2.620 ms
64 bytes from 10.0.0.1: seq=10 ttl=64 time=1.198 ms
64 bytes from 10.0.0.1: seq=11 ttl=64 time=18.121 ms
64 bytes from 10.0.0.1: seq=12 ttl=64 time=3.007 ms
64 bytes from 10.0.0.1: seq=13 ttl=64 time=1.199 ms
64 bytes from 10.0.0.1: seq=14 ttl=64 time=1.957 ms

Building latest AA now and will compare.

comment:6 Changed 5 years ago by anonymous

Attitude Adjustment r35731 has the same behaviour as trunk r35706

iwconfig; sleep 5; iwconfig:

Rx invalid nwid:0 Rx invalid crypt:0 Rx invalid frag:0
Tx excessive retries:2 Invalid misc:428 Missed beacon:0

Rx invalid nwid:0 Rx invalid crypt:0 Rx invalid frag:0
Tx excessive retries:2 Invalid misc:433 Missed beacon:0

comment:7 Changed 5 years ago by tusker@…

OK, latency measurements and high invalid misc are not good indicators of stability. r34361 performs better than the newer versions, with consistent high speedtest.net speeds.

Trunk r35706 - ~30mbit/s speedtest.net - latency during speedtest - high invalid misc
AA r35731 - ~30mbit/s speedtest.net - latency during speedtest - high invalid misc
AA r35148 - ~30mbit/s speedtest.net - latency during speedtest - high invalid misc
AA r34361 - ~50mbit/s speedtest.net - latency during speedtest - high invalid misc

comment:8 Changed 5 years ago by tusker@…

OK, more details here. Skype will cause a deauth on AA r34361:

[ 1078.840000] wlan0: deauthenticated from 20:dc:e6:fc:d3:1f (Reason: 2)
[ 1078.850000] br-lan: port 2(wlan0) entered disabled state
[ 1079.620000] wlan0: authenticate with 20:dc:e6:fc:d3:1f
[ 1079.640000] wlan0: send auth to 20:dc:e6:fc:d3:1f (try 1/3)
[ 1079.640000] wlan0: authenticated
[ 1079.660000] wlan0: associate with 20:dc:e6:fc:d3:1f (try 1/3)
[ 1079.660000] wlan0: RX AssocResp from 20:dc:e6:fc:d3:1f (capab=0x411 status=0 aid=3)
[ 1079.670000] wlan0: associated

AP 34361, Client Trunk 35741 ~40mbit/s speedtest.net
client router still shows:
Jan 1 00:05:08 tptwo kern.err kernel: [ 308.160000] ath: phy0: Failed to stop TX DMA, queues=0x005!
AP Trunk 35741, Client Trunk 35741 ~30mbit/s speedtest.net
client router still shows:
Jan 1 00:33:55 tptwo kern.err kernel: [ 2035.090000] ath: phy0: Failed to stop TX DMA, queues=0x004!

comment:9 Changed 5 years ago by nbd

please try latest trunk

comment:10 Changed 5 years ago by tusker@…

Built and flashed r35770 (had to do a make clean, and make menuconfig to pick up the new depends on relay):

Jan 1 00:09:10 tptwo kern.err kernel: [ 550.980000] ath: phy0: Failed to stop TX DMA, queues=0x005!

Speedtest results around 20mbit/s, with greater number of excessive retries:

wlan0 IEEE 802.11bgn ESSID:"mascordone"

Mode:Managed Frequency:2.432 GHz Access Point: 20:DC:E6:FC:D3:1F
Bit Rate=57.8 Mb/s Tx-Power=20 dBm
RTS thr:off Fragment thr:off
Encryption key:off
Power Management:off
Link Quality=35/70 Signal level=-75 dBm
Rx invalid nwid:0 Rx invalid crypt:0 Rx invalid frag:0
Tx excessive retries:1690 Invalid misc:572 Missed beacon:0

Latency isn't much better:

64 bytes from 10.0.0.1: icmp_req=218 ttl=64 time=1.15 ms
64 bytes from 10.0.0.1: icmp_req=219 ttl=64 time=0.922 ms
64 bytes from 10.0.0.1: icmp_req=220 ttl=64 time=4305 ms
64 bytes from 10.0.0.1: icmp_req=221 ttl=64 time=3309 ms
64 bytes from 10.0.0.1: icmp_req=222 ttl=64 time=2306 ms
64 bytes from 10.0.0.1: icmp_req=223 ttl=64 time=1302 ms
64 bytes from 10.0.0.1: icmp_req=224 ttl=64 time=308 ms
64 bytes from 10.0.0.1: icmp_req=225 ttl=64 time=1.17 ms

client:~# cat /sys/kernel/debug/ieee80211/phy0/ath9k/reset

Baseband Hang: 0

Baseband Watchdog: 2

Fatal HW Error: 0

TX HW error: 5

TX Path Hang: 1

PLL RX Hang: 0

MCI Reset: 0

AP:~# cat /sys/kernel/debug/ieee80211/phy0/ath9k/reset

Baseband Hang: 0

Baseband Watchdog: 0

Fatal HW Error: 0

TX HW error: 0

TX Path Hang: 0

PLL RX Hang: 0

MCI Reset: 0

comment:11 Changed 5 years ago by nbd

I'd like to know if these issues are mode or location specific. Please change the AP into a Client and vice versa and then see which one shows the errors in cat /sys/kernel/debug/ieee80211/phy0/ath9k/reset.
Thanks.

comment:12 Changed 5 years ago by nbd

please try r35786, it should hopefully fix these issues.

comment:13 Changed 5 years ago by tusker@…

Hi,

This seemed pretty good, but after quite a number of www.speedtest.net, it did hang, and latency recovery was really bad:

BARRIER BREAKER (Bleeding Edge, r35801)

Feb 26 11:11:49 tptwo kern.err kernel: [ 900.800000] ath: phy0: Failed to stop TX DMA, queues=0x005!

root@tptwo:~# cat /sys/kernel/debug/ieee80211/phy0/ath9k/reset

Baseband Hang: 0

Baseband Watchdog: 1

Fatal HW Error: 0

TX HW error: 0

TX Path Hang: 3

PLL RX Hang: 0

MCI Reset: 0

64 bytes from 10.0.0.1: seq=99 ttl=64 time=41.636 ms
64 bytes from 10.0.0.1: seq=100 ttl=64 time=57.683 ms
64 bytes from 10.0.0.1: seq=101 ttl=64 time=43.651 ms
64 bytes from 10.0.0.1: seq=102 ttl=64 time=3.035 ms
64 bytes from 10.0.0.1: seq=103 ttl=64 time=38.355 ms
64 bytes from 10.0.0.1: seq=104 ttl=64 time=3.757 ms
64 bytes from 10.0.0.1: seq=105 ttl=64 time=20.216 ms
64 bytes from 10.0.0.1: seq=106 ttl=64 time=10.447 ms
64 bytes from 10.0.0.1: seq=107 ttl=64 time=1.210 ms
64 bytes from 10.0.0.1: seq=109 ttl=64 time=1.197 ms
64 bytes from 10.0.0.1: seq=110 ttl=64 time=283.549 ms
64 bytes from 10.0.0.1: seq=111 ttl=64 time=1283.398 ms
64 bytes from 10.0.0.1: seq=112 ttl=64 time=1145.697 ms
64 bytes from 10.0.0.1: seq=113 ttl=64 time=2578.378 ms
64 bytes from 10.0.0.1: seq=114 ttl=64 time=1748.312 ms
64 bytes from 10.0.0.1: seq=115 ttl=64 time=1165.399 ms
64 bytes from 10.0.0.1: seq=116 ttl=64 time=533.593 ms
64 bytes from 10.0.0.1: seq=117 ttl=64 time=790.742 ms
64 bytes from 10.0.0.1: seq=118 ttl=64 time=2060.247 ms
64 bytes from 10.0.0.1: seq=119 ttl=64 time=2816.736 ms
64 bytes from 10.0.0.1: seq=120 ttl=64 time=2941.467 ms
64 bytes from 10.0.0.1: seq=121 ttl=64 time=2330.599 ms
64 bytes from 10.0.0.1: seq=122 ttl=64 time=1874.874 ms
64 bytes from 10.0.0.1: seq=123 ttl=64 time=1615.784 ms
64 bytes from 10.0.0.1: seq=124 ttl=64 time=3634.371 ms
64 bytes from 10.0.0.1: seq=125 ttl=64 time=3433.445 ms
64 bytes from 10.0.0.1: seq=126 ttl=64 time=2839.330 ms
64 bytes from 10.0.0.1: seq=127 ttl=64 time=2995.072 ms
64 bytes from 10.0.0.1: seq=128 ttl=64 time=2234.474 ms
64 bytes from 10.0.0.1: seq=130 ttl=64 time=3049.924 ms
64 bytes from 10.0.0.1: seq=131 ttl=64 time=3964.508 ms
64 bytes from 10.0.0.1: seq=133 ttl=64 time=3816.103 ms
64 bytes from 10.0.0.1: seq=134 ttl=64 time=3256.435 ms
64 bytes from 10.0.0.1: seq=135 ttl=64 time=2595.042 ms
64 bytes from 10.0.0.1: seq=136 ttl=64 time=1833.414 ms
64 bytes from 10.0.0.1: seq=137 ttl=64 time=1970.485 ms
64 bytes from 10.0.0.1: seq=138 ttl=64 time=1007.859 ms
64 bytes from 10.0.0.1: seq=139 ttl=64 time=19.669 ms
64 bytes from 10.0.0.1: seq=140 ttl=64 time=1.242 ms
64 bytes from 10.0.0.1: seq=141 ttl=64 time=1.411 ms
64 bytes from 10.0.0.1: seq=142 ttl=64 time=1.216 ms

comment:14 Changed 5 years ago by tusker@…

The hang above only happens on the Upload section of speedtest.net.

On AP, if I use iperf as server, I can't reproduce this.

On Client, if I use iperf as server, and AP as iperf client:

root@tptwo (client):~# iperf -u -s

root@tpone (AP):~# iperf -u -c 10.0.0.8 -t 60 -b 100M

comment:15 Changed 5 years ago by nbd

please try the current version

comment:16 Changed 5 years ago by nbd

  • Resolution set to no_response
  • Status changed from new to closed

comment:17 Changed 5 years ago by tusker@…

Hi nbd,

Apologies for not responding to this issue, I was travelling overseas and was not able to retest.

I am currently running with latest trunk, but still see the following on the client:

[ 9431.200000] ath: phy0: Failed to stop TX DMA, queues=0x005!
[ 9502.220000] ath: phy0: Failed to stop TX DMA, queues=0x005!
[14579.520000] ath: phy0: Failed to stop TX DMA, queues=0x005!
[15843.890000] ath: phy0: Failed to stop TX DMA, queues=0x005!
[16756.910000] ath: phy0: Failed to stop TX DMA, queues=0x005!
[58434.060000] ath: phy0: Failed to stop TX DMA, queues=0x005!
[58557.560000] ath: phy0: Failed to stop TX DMA, queues=0x004!
[58629.590000] ath: phy0: Failed to stop TX DMA, queues=0x005!
[59730.090000] ath: phy0: Failed to stop TX DMA, queues=0x004!

root@tpone:~# iperf -u -c 10.0.0.8 -t 60 -b 100M


Client connecting to 10.0.0.8, UDP port 5001
Sending 1470 byte datagrams
UDP buffer size: 160 KByte (default)


[ 3] local 10.0.0.1 port 59954 connected with 10.0.0.8 port 5001
[ ID] Interval Transfer Bandwidth
[ 3] 0.0-60.0 sec 241 MBytes 33.7 Mbits/sec
[ 3] Sent 171920 datagrams
[ 3] Server Report:
[ 3] 0.0-60.2 sec 216 MBytes 30.1 Mbits/sec 33.224 ms 18015/171921 (10%)

comment:18 Changed 5 years ago by tusker@…

  • Resolution no_response deleted
  • Status changed from closed to reopened

comment:19 Changed 5 years ago by nbd

In addition to the messages, do you also still see latency issues or connectivity problems?

comment:20 Changed 5 years ago by tusker@…

Hi,

a) Latency stays around the 20ms mark during the iperf test (where wifi client is iperf server). Around 50Mbits/sec throughput
b) Latency stays around 30ms mark during the iperf test in the other direction (where wifi AP is iperf server). Around 25Mbit/sec throughput
c) Mar 13 02:28:12 tptwo kern.err kernel: [ 908.420000] ath: phy0: Failed to stop TX DMA, queues=0x004! is still easy to reproduce, but I don't see any real effect on throughput or latency.

The fact that throughput is quite different in the different directions is concerning, but in terms of stability, we are definitely in a good spot!

Thanks so much for the perseverance.

For reference sake, the above tests were carried out on BARRIER BREAKER (Bleeding Edge, r35995)

Cheers,

Damien

comment:21 Changed 5 years ago by nbd

The throughput difference could be caused by the client bridge hackery. Please test latest trunk (committed some more fixes) without client-bridge to see if that works better.

comment:22 Changed 5 years ago by tusker@…

With client bridge disabled, the performance is around about the same as previously tested:

root@tptwo:/etc/config# iperf -u -c 10.0.0.1 -t 60 -b 100M


Client connecting to 10.0.0.1, UDP port 5001
Sending 1470 byte datagrams
UDP buffer size: 160 KByte (default)


[ 3] local 10.0.0.8 port 57802 connected with 10.0.0.1 port 5001
[ ID] Interval Transfer Bandwidth
[ 3] 0.0-60.0 sec 152 MBytes 21.2 Mbits/sec
[ 3] Sent 108172 datagrams
[ 3] Server Report:
[ 3] 0.0-60.0 sec 152 MBytes 21.2 Mbits/sec 1.528 ms 2/108173 (0.0018%)
root@tptwo:/etc/config# iperf -u -s


Server listening on UDP port 5001
Receiving 1470 byte datagrams
UDP buffer size: 160 KByte (default)


[ 3] local 10.0.0.8 port 5001 connected with 10.0.0.1 port 39755
[ ID] Interval Transfer Bandwidth Jitter Lost/Total Datagrams
[ 3] 0.0-60.0 sec 386 MBytes 54.0 Mbits/sec 0.275 ms 641/276283 (0.23%)

comment:23 Changed 5 years ago by tusker@…

Hi,

I have just flashed r36485, and the performance is really bad.

a) Latency with little or no traffic is about 10-40ms, where as previously (ie, r35995) latency without traffic was around 1-1.5ms.
b) Using the same tests as above, client to AP traffic is 553 Kbits/sec, and AP to client traffic is 36.4Mbits/sec. This is a severe drop in performance.

There are no configuration changes made between tests, and nothing is written to either AP or client logs.


root@tpone:~# iperf -u -s


Server listening on UDP port 5001
Receiving 1470 byte datagrams
UDP buffer size: 160 KByte (default)


[ 3] local 10.0.0.1 port 5001 connected with 10.0.0.8 port 45986
[ ID] Interval Transfer Bandwidth Jitter Lost/Total Datagrams
[ 3] 0.0-61.1 sec 4.03 MBytes 553 Kbits/sec 61.640 ms 435/ 3307 (13%)
read failed: Connection refused
Croot@tpone:~# iperf -u -c 10.0.0.8 -t 60 -b 100M


Client connecting to 10.0.0.8, UDP port 5001
Sending 1470 byte datagrams
UDP buffer size: 160 KByte (default)


[ 3] local 10.0.0.1 port 58433 connected with 10.0.0.8 port 5001
[ ID] Interval Transfer Bandwidth
[ 3] 0.0-60.0 sec 261 MBytes 36.4 Mbits/sec
[ 3] Sent 185834 datagrams
[ 3] Server Report:
[ 3] 0.0-60.0 sec 260 MBytes 36.4 Mbits/sec 2.401 ms 81/185835 (0.044%)

comment:24 Changed 5 years ago by tusker@…

In addition to the above performance issue (which makes it pretty unusable), the following has shown in the kernel log on the client.

Apr 29 06:02:31 tptwo kern.err kernel: [ 2524.490000] ath: phy0: Failed to stop TX DMA, queues=0x005!
Apr 29 06:09:50 tptwo kern.err kernel: [ 2963.370000] ath: phy0: Failed to stop TX DMA, queues=0x005!
Apr 29 06:10:57 tptwo kern.err kernel: [ 3030.390000] ath: phy0: Failed to stop TX DMA, queues=0x005!
Apr 29 06:11:08 tptwo kern.err kernel: [ 3041.410000] ath: phy0: Failed to stop TX DMA, queues=0x005!
Apr 29 06:15:56 tptwo kern.info kernel: [ 3329.410000] wlan0: deauthenticated from 20:dc:e6:fc:d3:1f (Reason: 2)
Apr 29 06:15:56 tptwo kern.info kernel: [ 3329.420000] br-lan: port 2(wlan0) entered disabled state
Apr 29 06:15:57 tptwo kern.err kernel: [ 3329.980000] ath: phy0: Failed to stop TX DMA, queues=0x004!
W is in FULL_SLEEP mode
Apr 29 06:15:59 tptwo kern.err kernel: [ 3332.220000] ath: phy0: TX while HW is in FULL_SLEEP mode
Apr 29 06:15:59 tptwo kern.err kernel: [ 3332.220000] ath: phy0: TX while HW is in FULL_SLEEP mode
Apr 29 06:15:59 tptwo kern.err kernel: [ 3332.230000] ath: phy0: TX while HW is in FULL_SLEEP mode
Apr 29 06:15:59 tptwo kern.err kernel: [ 3332.230000] ath: phy0: TX while HW is in FULL_SLEEP mode
Apr 29 06:15:59 tptwo kern.err kernel: [ 3332.240000] ath: phy0: TX while HW is in FULL_SLEEP mode

comment:25 Changed 5 years ago by tusker@…

Reverting back to r36452 (ie, last commit before kernel 3.8.9) brings back performance back to "normal", so rather than sub 1mbps, it's 37mbps.

comment:26 Changed 5 years ago by nbd

any update on tests with the patch that I suggested?

comment:27 Changed 5 years ago by nbd

please try latest trunk (without the patch), I added some more fixes.

comment:28 Changed 5 years ago by tusker@…

Hi,

Testing with latest trunk, without the previous patch yields less than positive results:

1) Latency hits 600ms between AP and client
2) Download speed (AP->client) crawls at 5Mbps
3) Upload (client->AP) is even worse at around 90kbps

Cheers,

Damien

comment:29 Changed 5 years ago by tusker@…

Latest trunk + patch, latency stays between 1 and 10ms, speedtest.net download around 33mbps, upload around 2mbps (saturating my upstream).

iperf tests (for some reason I can't compile my own iperf on trunk at the moment, so have to download manually via opkg):

AP -> client: 38.3Mbits/sec, hits 14778 ms
client -> AP: 577 Kbits/sec, hits 1305 ms

Changed 5 years ago by tusker@…

comment:30 Changed 5 years ago by tusker@…

I modified /etc/init.d/network to start the trace prior to wifi down wifi up, and attached the generated trace.dat.gz to this ticket

start() {

stop
[ -e /proc/sys/kernel/core_pattern ] && {

ulimit -c unlimited
echo '/tmp/%e.%p.%s.%t.core' > /proc/sys/kernel/core_pattern

}
service_start /sbin/netifd

setup_switch() { return 0; }

include /lib/network
setup_switch

sleep 5

screen -d -m trace-cmd record -e mac80211

/sbin/wifi down
/sbin/wifi up

}

comment:31 Changed 5 years ago by nbd

I committed some more fixes, please test.

comment:32 Changed 5 years ago by tusker@…

This seems slightly different, where the connection dies fully rather than deteriating. Please see trace.dat.2.gz.

Changed 5 years ago by tusker@…

comment:33 Changed 5 years ago by nbd

Found a way to simplify the aggregation code and make it more reliable. Please test the current version.

comment:34 Changed 5 years ago by tusker@…

Hi,

Rebooting with the latest trunk (r36663) works without having to restart wifi.

I will test during the day to ensure that it remains stable.

Cheers,

Damien

comment:35 Changed 5 years ago by nbd

  • Resolution set to fixed
  • Status changed from reopened to closed

comment:36 Changed 4 years ago by jow

  • Milestone changed from Attitude Adjustment 12.09 to Barrier Breaker 14.07

Milestone Attitude Adjustment 12.09 deleted

Add Comment

Modify Ticket

Action
as closed .
The resolution will be deleted. Next status will be 'reopened'.
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.