Modify

Opened 22 months ago

Last modified 19 months ago

#22228 new defect

WDS sta Disconnect Unable to Reconnect UNLESS ap reboot

Reported by: openwrt.org-2012@… Owned by: developers
Priority: normal Milestone: Designated Driver (Trunk)
Component: packages Version: Trunk
Keywords: Cc:

Description

Have several WNDR3800. Using 5 GHz radios for WDS and 2.4 GHz for users. Using OpenWrt Designated Driver r49176.

Last year or so WDS was rock solid. 2016 or so builds have disconnect problems with WDS. Occasionally a WDS client (the sta's) will disconnect and be unable to reconnect unless the WDS server (the ap) is rebooted.

I reports this yesterday in the forum and it was confirmed. See here, starting at post 1403:
https://forum.openwrt.org/viewtopic.php?id=28392&p=57

I have yet to find a way to reliably replicate the issue, but willing to test.

Thanks for the great software!!!!

Attachments (0)

Change History (15)

comment:1 Changed 22 months ago by anonymous

I can confirm those wds problems with ath9k on 15.04. I also cant force them to appear. It is always suddenly not working any more. In my case its wds on 2,4ghz and clients connect to 2,4ghz.

comment:2 follow-up: Changed 22 months ago by pkgadd

I'm observing the same issue with a TP-Link TL-WDR4300 v1 as WDS-AP and an
old TP-Link TL-WR941ND v1 as WDS-STA, to provide a connection to a single
wired client (IPv4 and IPv6) over 2.4 GHz WLAN. At first, the WDS-STA device
(TL-WR941ND v1) connects fine to the WDS-AP (TL-WDR4300 v1), but after a
while (ranging from a few minutes to 3-4 days) the connection stops
responding and both the TL-WR941ND v1 and the wired client behind it timing
out. Rebooting the WDS-STA usually doesn't recover the connection, while
rebooting the WDS-AP always helps (at least for a while). The WDS link
goes through several interior walls and a floor, which means it's rather
weak and has to suffer intermittent interference.

Given that the TL-WR941ND v1 is only powered on ~twice in three weeks on
average, for a couple of hours each, I can't be 100% sure when exactly
the problems started, but I'm rather confident about this timeframe.

Last known good version:
unmodified OpenWrt trunk snapshots from late last year (on both devices),
I'm pretty sure that trunk r48185 was running fine, as I've been running
that revision for quite a while, without remembering any of these issues.

First known bad version:
It started rather early this year, while I have unfortunately already
deleted intermediate snapshot builds, trunk r49065 is definately affected.

My current working theory, unconfirmed(!), is a potential regression between
hostapd v2.3 (2015-03-25 plus security fixes) and 2016-01-15. The WDS-AP
(TL-WDR4300 v1) has the full wpad package installed, while the WDS-STA
(TL-WR941ND v1) only uses wpad-mini, due to its limited flash size.

In order to debug this further, I've first built a trunk r49166 snapshot for
the TL-WR941ND v1 with package/network/services/hostapd/* reverted to the
state of r48345. This didn't improve the situation and I triggered the issue
within 1-2 hours.

My second approach was basically doing the same for the TL-WDR4300, namely
injecting a r48345 /usr/sbin/wpad into the known-broken r49065 firmware via
overlayfs. With this change, the WDS connection has been up without problems
for 23 hours so far, but it is too early to be 100% sure about this
observation yet (as it may still trigger within the next 3-4 days). At the
moment both devices are (still successfully) running wpad{,-mini}
2015-03-25-2 (package/network/services/hostapd/* reverted to r48345).

Network topology:

+--------------------------------------------------------------------------+
|                                 TL-WDR4300                               |
+----------------------------------------------+---------------+-----------+
|                    2.4  GHz                  |     5  GHz    |   wired   |
|                     WDS-AP                   |    plain AP   |  switch   |
+----------------------------------------------+---------------+-----------+
         |                |            |               |             |
+-----------------+  +---------+  +---------+     +---------+        |
|  TL-WR941ND v1  |  |  plain  |  |  plain  |     |  plain  |       /
| 2.4 GHz WDS-STA |  |   STA   |  |  STA(s) |     |  STA(s) |      |
|    no AP i/f    |  | 2.4 GHz |  | 2.4 GHz |     |  5 GHz  |      /
+-----------------+  +---------+  +---------+     +---------+     |
         |                                                        |
+-----------------+                                      +-----------------+
|  wired client   |                                      | wired client(s) |
+-----------------+                                      +-----------------+

TP-Link TL-WDR4300 v1, /etc/config/wireless:

config wifi-device 'radio0'
	option type 'mac80211'
	option hwmode '11g'
	option path 'platform/ar934x_wmac'
	option country 'DE'
	option htmode 'HT20'
	option txpower '20'
	option channel '13'

config wifi-iface
	option device 'radio0'
	option network 'lan'
	option mode 'ap'
	option ssid 'XXX-2'
	option encryption 'psk2+ccmp'
	option key 'YYY'
	option wds '1'

config wifi-device 'radio1'
	option type 'mac80211'
	option hwmode '11a'
	option path 'pci0000:00/0000:00:00.0'
	option country 'DE'
	option htmode 'HT40'
	option txpower '17'
	option channel '36'

config wifi-iface
	option device 'radio1'
	option network 'lan'
	option mode 'ap'
	option ssid 'XXX-5'
	option encryption 'psk2+ccmp'
	option key 'ZZZ'

TP-Link TL-WR941ND v1, /etc/config/wireless:

config wifi-device 'radio0'
	option type 'mac80211'
	option hwmode '11g'
	option path 'platform/ath9k'
	option country 'DE'
	option htmode 'HT20'
	option txpower '20'
	option channel '13'

config wifi-iface
	option device 'radio0'
	option network 'lan'
	option mode 'sta'
	option ssid 'XXX-2'
	option encryption 'psk2+ccmp'
	option key 'YYY'
	option wds '1'

TP-Link TL-WR941ND v1, /etc/config/network,
the WAN port is not connected,
one wired client is connected to a LAN port:

config interface 'loopback'
	option ifname 'lo'
	option proto 'static'
	option ipaddr '127.0.0.1'
	option netmask '255.0.0.0'

config interface 'eth'
	option ifname 'eth0'
	option proto 'none'

config interface 'lan'
	option ifname 'lan1 lan2 lan3 lan4'
	option force_link '1'
	option type 'bridge'
	option proto 'dhcp'

config interface 'wan'
	option ifname 'wan'
	option proto 'dhcp'

config interface 'wan6'
	option ifname 'wan'
	option proto 'dhcpv6'
Last edited 22 months ago by pkgadd (previous) (diff)

comment:3 Changed 22 months ago by anonymous

This problem also exist in the 15.05.1 images?

comment:4 Changed 22 months ago by openwrt.org-2012@…

Possibly related:
/ticket/22239.html

comment:5 in reply to: ↑ 2 Changed 22 months ago by pkgadd

Replying to pkgadd:
After a week without any WDS failures, I'm now rather confident that my issues are indeed caused by the hostapd upgrade from 2015-03-25 to 2016-01-15.

Current status:

  • WDS-AP: TP-Link TL-WDR4300 v1, running OpenWrt/ trunk r49065, wpad reverted to 2015-03-25-2 (r48345)
  • WDS-STA: TP-Link TL-WR941ND v1, running OpenWrt/ trunk r49166, wpad-mini reverted to 2015-03-25-2 (r48345)

wpad 2016-01-15-2 from current trunk HEAD (on the WDS-AP) triggers the link failure for me quite reliably, somewhere within 10 minutes to 3-4 days; (short term) recovering is only possible by rebooting the WDS-AP, rebooting the WDS-STA has no effect.

comment:6 Changed 22 months ago by anonymous

Thanks for the tests. This mean that 15.05.1 with hostapd from 2015 should work fine.

https://dev.openwrt.org/browser/branches/chaos_calmer/package/network/services/hostapd/Makefile

comment:7 Changed 22 months ago by anonymous

Could somebody please explain how to install an old hostapd once 15.05.1 is installed?

comment:8 Changed 22 months ago by geadas

hello.

since i upgraded the firmware to 15.05.1 and with other trunk images i was seeing this problem.

the solution was to revert to 15.05 again.

but i saved some logs that may help to solve the problem. these are logs from the wds client. wlan0 is the wds client interface. wlan0-1 is a virtual ap.

logs when it stopped to work:


Tue Apr 12 06:58:08 2016 kern.info kernel: [53811.627498] wlan0: disassociated from ma:ca:dd:re:ss (Reason: 4)
Tue Apr 12 06:58:08 2016 kern.info kernel: [53811.632445] br-lan: port 3(wlan0) entered disabled state
Tue Apr 12 06:58:08 2016 daemon.notice netifd: Network device 'wlan0' link is down
Tue Apr 12 06:58:08 2016 kern.info kernel: [53811.685037] wlan0: authenticate with ma:ca:dd:re:ss
Tue Apr 12 06:58:08 2016 kern.info kernel: [53811.700458] wlan0: send auth to ma:ca:dd:re:ss (try 1/3)
Tue Apr 12 06:58:08 2016 kern.info kernel: [53811.709885] wlan0: authenticated
Tue Apr 12 06:58:08 2016 kern.info kernel: [53811.712624] ath9k ar933x_wmac wlan0: disabling HT as WMM/QoS is not supported by the AP
Tue Apr 12 06:58:08 2016 kern.info kernel: [53811.719758] ath9k ar933x_wmac wlan0: disabling VHT as WMM/QoS is not supported by the AP
Tue Apr 12 06:58:08 2016 kern.info kernel: [53811.733824] wlan0: associate with ma:ca:dd:re:ss (try 1/3)
Tue Apr 12 06:58:08 2016 kern.info kernel: [53811.741472] wlan0: RX AssocResp from ma:ca:dd:re:ss (capab=0x431 status=0 aid=28)
Tue Apr 12 06:58:08 2016 kern.info kernel: [53811.748212] wlan0: associated
Tue Apr 12 06:58:08 2016 daemon.notice netifd: Network device 'wlan0' link is up
Tue Apr 12 06:58:08 2016 kern.info kernel: [53811.807530] br-lan: port 3(wlan0) entered forwarding state
Tue Apr 12 06:58:08 2016 kern.info kernel: [53811.811677] br-lan: port 3(wlan0) entered forwarding state
Tue Apr 12 06:58:10 2016 kern.info kernel: [53813.803752] br-lan: port 3(wlan0) entered forwarding state


logs when it restarted to work (nothing was done from my part, it was 1 day and few hours later):


Wed Apr 13 09:38:31 2016 daemon.notice netifd: Network device 'wlan0' link is down
Wed Apr 13 09:38:31 2016 kern.info kernel: [149834.259154] br-lan: port 3(wlan0) entered disabled state
Wed Apr 13 09:38:31 2016 kern.info kernel: [149834.265461] br-lan: port 2(wlan0-1) entered disabled state
Wed Apr 13 09:38:31 2016 daemon.notice netifd: Network device 'wlan0-1' link is down
Wed Apr 13 09:38:32 2016 daemon.notice netifd: Bridge 'br-lan' link is down
Wed Apr 13 09:38:32 2016 daemon.notice netifd: Interface 'lan' has link connectivity loss
Wed Apr 13 09:39:04 2016 kern.info kernel: [149867.503353] wlan0: authenticate with ma:ca:dd:re:ss
Wed Apr 13 09:39:04 2016 kern.info kernel: [149867.519095] wlan0: send auth to ma:ca:dd:re:ss (try 1/3)
Wed Apr 13 09:39:05 2016 kern.info kernel: [149867.525716] wlan0: authenticated
Wed Apr 13 09:39:05 2016 kern.info kernel: [149867.528554] ath9k ar933x_wmac wlan0: disabling HT as WMM/QoS is not supported by the AP
Wed Apr 13 09:39:05 2016 kern.info kernel: [149867.535723] ath9k ar933x_wmac wlan0: disabling VHT as WMM/QoS is not supported by the AP
Wed Apr 13 09:39:05 2016 kern.info kernel: [149867.548655] wlan0: associate with ma:ca:dd:re:ss (try 1/3)
Wed Apr 13 09:39:05 2016 kern.info kernel: [149867.577857] wlan0: RX AssocResp from ma:ca:dd:re:ss (capab=0x431 status=0 aid=2)
Wed Apr 13 09:39:05 2016 kern.info kernel: [149867.584678] wlan0: associated
Wed Apr 13 09:39:05 2016 daemon.notice netifd: Network device 'wlan0' link is up
Wed Apr 13 09:39:05 2016 daemon.notice netifd: Bridge 'br-lan' link is up
Wed Apr 13 09:39:05 2016 daemon.notice netifd: Interface 'lan' has link connectivity
Wed Apr 13 09:39:05 2016 kern.info kernel: [149867.674889] br-lan: port 3(wlan0) entered forwarding state
Wed Apr 13 09:39:05 2016 kern.info kernel: [149867.679196] br-lan: port 3(wlan0) entered forwarding state
Wed Apr 13 09:39:05 2016 daemon.notice netifd: Network device 'wlan0-1' link is up
Wed Apr 13 09:39:05 2016 kern.info kernel: [149868.384811] br-lan: port 2(wlan0-1) entered forwarding state
Wed Apr 13 09:39:05 2016 kern.info kernel: [149868.389253] br-lan: port 2(wlan0-1) entered forwarding state
Wed Apr 13 09:39:07 2016 kern.info kernel: [149869.678609] br-lan: port 3(wlan0) entered forwarding state
Wed Apr 13 09:39:07 2016 kern.info kernel: [149870.388612] br-lan: port 2(wlan0-1) entered forwarding state


it seems to be something to do with the bridge. dont know.

the truth is that if i restarted the wifi on the wds ap, the wds client restarted to work.

hope this may help to solve the problem.

thank you.

comment:9 follow-up: Changed 22 months ago by johnthomas00

Interestingly, this problem seems to have gone away for me by pinging the AP and all STAs every 2 minutes. I found this because I built a python script to reboot the AP if one of the STAs was down. The python script tests if they are down by pinging them every 2 minutes, but now they do not go down. Obviously this hack is not a solution, but to help further identify the issue so it can be fixed.

As I read down the list of bugs in this tracker, I wonder if some of the other vague wireless connection issues are related to this bug.

comment:10 in reply to: ↑ 9 Changed 22 months ago by geadas

Replying to johnthomas00:

Interestingly, this problem seems to have gone away for me by pinging the AP and all STAs every 2 minutes. I found this because I built a python script to reboot the AP if one of the STAs was down. The python script tests if they are down by pinging them every 2 minutes, but now they do not go down. Obviously this hack is not a solution, but to help further identify the issue so it can be fixed.

As I read down the list of bugs in this tracker, I wonder if some of the other vague wireless connection issues are related to this bug.

hello.

tried myself that solution but 5 days later the problem arised again.

best regards.

comment:11 Changed 21 months ago by diizzyy@…

I'm seeing the same issue here,

TL-WDR3600 running r49161 acting as AP/WDS "Master" and another WDR3600 as WDS client. I've also tried using a WD MyNet N750 but the same issue occurs, if the client disconnects you need to restart the WDS "Master" box. As soon as you do that everything connects again and works fine.

comment:12 follow-up: Changed 21 months ago by diizzyy@…

Update on the issue, I seem to be able to trigger this issue over 2-3h hours of light data transfer (~3-4mbit) and most of the time just rebooting the AP doesn't resolve the issue. From what I can tell you need to reboot both nodes within the bootup timeframe of each end otherwise the link wont be established.

comment:13 in reply to: ↑ 12 Changed 21 months ago by pkgadd

Replying to diizzyy@…:

Update on the issue, I seem to be able to trigger this issue over 2-3h hours of light data transfer (~3-4mbit) and most of the time just rebooting the AP doesn't resolve the issue. From what I can tell you need to reboot both nodes within the bootup timeframe of each end otherwise the link wont be established.

Heavy traffic (continuous playback of a vdr stream) seems to help avoiding the stalling, but it doesn't completely prevent it from happening.
Light traffic (one wired client browsing the web), with occassional short interruptions doesn't really help.
No active traffic (the wired client behind the WDS-client is switched off) is quite affected.

Personally I've never seen a need to reboot the WDS-client (but I reverted the client to wpad-mini 2015-03-25-2 before touching the WDS-AP), rebooting only the WDS-AP always recovered the problem (it's been stable with wpad 2015-03-25-2 for me since I first reverted the WDS-AP to that version).

comment:14 Changed 21 months ago by johnthomas00

I am unable to replicate this problem in Chaos Calmer, which is running: bin/ar71xx/packages/base/wpad_2015-03-25-1_ar71xx.ipk

Add Comment

Modify Ticket

Action
as new .
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.