Modify

Opened 4 years ago

Closed 4 years ago

#14144 closed defect (not_a_bug)

Interface eth1 up and down continuously.

Reported by: yszhou4tech@… Owned by: developers
Priority: normal Milestone: Attitude Adjustment 12.09.1
Component: packages Version: Trunk
Keywords: Cc:

Description

It's r37920 on TL-WR743N-v2 of platform ar71xx. The issue happened before on a Buffalo WZR-G300-NH, both in a random way. This is similar to /ticket/10269.html.

My guess is that it's most-likely a race condition issue. But I am not yet quite familiar with netifd and ubus things to catch the point.

Here is the "Kernel Log".

[   26.120000] br-lan: port 2(wlan0) entered forwarding state
[   28.120000] br-lan: port 2(wlan0) entered forwarding state
[   35.400000] eth1: link down
[   36.370000] eth1: link up (100Mbps/Full duplex)
[   50.820000] eth1: link down
[   51.370000] eth1: link up (100Mbps/Full duplex)
[   66.240000] eth1: link down
[   66.370000] eth1: link up (100Mbps/Full duplex)
[   81.680000] eth1: link down
[   82.370000] eth1: link up (100Mbps/Full duplex)
[   97.110000] eth1: link down
[   97.370000] eth1: link up (100Mbps/Full duplex)
[  112.550000] eth1: link down
[  113.370000] eth1: link up (100Mbps/Full duplex)
[  127.970000] eth1: link down
[  128.370000] eth1: link up (100Mbps/Full duplex)
[  143.400000] eth1: link down
[  144.370000] eth1: link up (100Mbps/Full duplex)
[  158.880000] eth1: link down
[  159.370000] eth1: link up (100Mbps/Full duplex)
[  174.450000] eth1: link down
[  175.370000] eth1: link up (100Mbps/Full duplex)
[  190.030000] eth1: link down
[  190.370000] eth1: link up (100Mbps/Full duplex)
[  205.570000] eth1: link down
[  206.370000] eth1: link up (100Mbps/Full duplex)
[  221.140000] eth1: link down
...UP AND DOWN ALL THE WAY TO THE BOTTOM OF THE LOG...

Here is the "System Log".

Sep  9 14:49:56 OpenWrt daemon.info pppd[6071]: System time change detected.
Sep  9 14:50:06 OpenWrt daemon.info pppd[6071]: No response to 5 echo-requests
Sep  9 14:50:06 OpenWrt daemon.notice pppd[6071]: Serial link appears to be disconnected.
Sep  9 14:50:06 OpenWrt daemon.info pppd[6071]: Connect time 0.6 minutes.
Sep  9 14:50:06 OpenWrt daemon.info pppd[6071]: Sent 51097 bytes, received 51191 bytes.
Sep  9 14:50:06 OpenWrt daemon.notice netifd: Interface 'wan' has lost the connection
Sep  9 14:50:12 OpenWrt daemon.notice pppd[6071]: Connection terminated.
Sep  9 14:50:12 OpenWrt daemon.notice pppd[6071]: Modem hangup
Sep  9 14:50:47 OpenWrt daemon.warn dnsmasq[2557]: no servers found in /tmp/resolv.conf.auto, will retry
Sep  9 14:50:57 OpenWrt daemon.warn pppd[6071]: Timeout waiting for PADO packets
Sep  9 14:50:57 OpenWrt daemon.err pppd[6071]: Unable to complete PPPoE Discovery
Sep  9 14:50:57 OpenWrt daemon.info pppd[6071]: Exit.
Sep  9 14:50:57 OpenWrt daemon.notice netifd: Interface 'wan' is now down
Sep  9 14:50:57 OpenWrt kern.info kernel: [ 1718.060000] eth1: link down
Sep  9 14:50:57 OpenWrt daemon.info pppd[6169]: Plugin rp-pppoe.so loaded.
Sep  9 14:50:57 OpenWrt daemon.info pppd[6169]: RP-PPPoE plugin version 3.8p compiled against pppd 2.4.5
Sep  9 14:50:57 OpenWrt daemon.notice pppd[6169]: pppd 2.4.5 mod by 0e180204181910 started by root, uid 0
Sep  9 14:50:58 OpenWrt kern.info kernel: [ 1718.740000] eth1: link up (100Mbps/Full duplex)
Sep  9 14:51:00 OpenWrt daemon.info dnsmasq-dhcp[2557]: DHCPINFORM(br-lan) 192.168.1.217 60:36:dd:af:8e:53 
Sep  9 14:51:00 OpenWrt daemon.info dnsmasq-dhcp[2557]: DHCPACK(br-lan) 192.168.1.217 60:36:dd:af:8e:53 dell-PC
Sep  9 14:51:17 OpenWrt daemon.warn pppd[6169]: Timeout waiting for PADO packets
Sep  9 14:51:17 OpenWrt daemon.err pppd[6169]: Unable to complete PPPoE Discovery
Sep  9 14:51:17 OpenWrt daemon.info pppd[6169]: Exit.
Sep  9 14:51:17 OpenWrt daemon.notice netifd: Interface 'wan' is now down
Sep  9 14:51:17 OpenWrt kern.info kernel: [ 1738.100000] eth1: link down
Sep  9 14:51:17 OpenWrt daemon.info pppd[6186]: Plugin rp-pppoe.so loaded.
Sep  9 14:51:17 OpenWrt daemon.info pppd[6186]: RP-PPPoE plugin version 3.8p compiled against pppd 2.4.5
Sep  9 14:51:17 OpenWrt daemon.notice pppd[6186]: pppd 2.4.5 mod by 0e180204181910 started by root, uid 0
Sep  9 14:51:18 OpenWrt kern.info kernel: [ 1738.740000] eth1: link up (100Mbps/Full duplex)
Sep  9 14:51:32 OpenWrt daemon.warn pppd[6186]: Timeout waiting for PADO packets
Sep  9 14:51:32 OpenWrt daemon.err pppd[6186]: Unable to complete PPPoE Discovery
Sep  9 14:51:32 OpenWrt daemon.info pppd[6186]: Exit.
Sep  9 14:51:33 OpenWrt daemon.notice netifd: Interface 'wan' is now down
Sep  9 14:51:33 OpenWrt kern.info kernel: [ 1753.550000] eth1: link down
Sep  9 14:51:33 OpenWrt kern.info kernel: [ 1753.740000] eth1: link up (100Mbps/Full duplex)
Sep  9 14:51:33 OpenWrt daemon.info pppd[6203]: Plugin rp-pppoe.so loaded.
Sep  9 14:51:33 OpenWrt daemon.info pppd[6203]: RP-PPPoE plugin version 3.8p compiled against pppd 2.4.5
Sep  9 14:51:33 OpenWrt daemon.notice pppd[6203]: pppd 2.4.5 mod by 0e180204181910 started by root, uid 0
Sep  9 14:51:48 OpenWrt daemon.warn pppd[6203]: Timeout waiting for PADO packets
Sep  9 14:51:48 OpenWrt daemon.err pppd[6203]: Unable to complete PPPoE Discovery
Sep  9 14:51:48 OpenWrt daemon.info pppd[6203]: Exit.
Sep  9 14:51:48 OpenWrt daemon.notice netifd: Interface 'wan' is now down
Sep  9 14:51:48 OpenWrt kern.info kernel: [ 1769.000000] eth1: link down
...

Attachments (0)

Change History (11)

comment:1 Changed 4 years ago by alexatkinuk

This sound like the problem I found when I upgraded to Attitude Adjustment.

In their infinite wisdom the OpenWRT devs changed the ppp script so that ppp will only attempt 1 retry for authentication and then give up. After ppp gives up the script resets the ethernet adapter before trying again.

Unfortunately some modems (eg mine, the Huawei HG612) do not bring up the network quick enough after a link reset and so its never able to establish the PPPoE link in time, so OpenWRT gets stuck in an infinite loop of never connecting.

The fix is to modify the ppp script with a higher maxfail value such as 5, although personally I switch it to 0 (try forever) by doing the following on the router:
sed -i 's/maxfail 1/maxfail 0 holdoff 2/g' /lib/netifd/proto/ppp.sh

I would also recommend setting LCP echo failure threshold and LCP echo interval in Advanced Settings of your WAN connection. In my case I set them to 15 and 1, which means ppp pings every second but will only assume the link is dead if it fails for 15 seconds. It seems sometimes my connection would just freeze for a few seconds and letting ppp drop just makes it take longer to recover, whereas having it hold on to the connection meant only those few seconds of packet loss.

Last edited 4 years ago by alexatkinuk (previous) (diff)

comment:2 Changed 4 years ago by jow

The intention behind "maxfail 1" is to allow netifd to handle failures in ppp connects. By setting maxfail to 0, pppd will never die and netifd will be unable to handle the interface failure.

If the link is not ready when pppd is launched by netifd then we should probably look into checking the link state before initiating the connection, that would be a proper fix.

comment:3 Changed 4 years ago by yszhou4tech@…

I don't understand why eth1 needs to be turned down when connection fails. We can keep it up and do the connection when needed, right? IIRC, that's how I do a PPPoE dial up on a PC.

comment:4 Changed 4 years ago by alexatkinuk

Exactly how does it "handle" ppp failures other than resetting the ethernet adapter? Because I have my own ppp connection logging/handler in /etc/ppp/ip-up.d and /etc/ppp/ip-down.d to log and update QoS settings with my current sync rates pulled from the modem, log connect time, etc.

It could very well just be my ISPs PPP server takes a long time to respond and nothing to do with ethernet coming up and down. Checking the logs right now I cannot see the three failures before success on my last connection, so it may be my ISP fixed that. The important point is, the setting of maxfail 1 does not work if there is a delay in establishing the connection so its clearly not a good default setting.

Replying to yszhou4tech@…:

I don't understand why eth1 needs to be turned down when connection fails. We can keep it up and do the connection when needed, right? IIRC, that's how I do a PPPoE dial up on a PC.

I assume it was intended to be a "just in case PPPoE failed because the ethernet adapter has frozen".

Last edited 4 years ago by alexatkinuk (previous) (diff)

comment:5 Changed 4 years ago by yousong

Pppd connection failure should be handled by pppd itself. Forcing the maxfail 1 logic hurts the reliability of pppd. Netifd should only come in after pppd failed and quit.

I suggest that we should remove maxfail 1 and let the user who needs it specify the option through pppd_options.

comment:6 Changed 4 years ago by anonymous

I have the same issue, but different scenario.
When I change from repeater mode back to dhcp mode, wen I execute "/etc/init.d/network restart", this happens. Wan just up then down continuously.

comment:7 follow-up: Changed 4 years ago by mihanson

I'm seeing a very similar problem using 12.09 AA release image for a Buffalo WZR-600DHP (using the WZR-HP-AG300H image). I'm trying to pull an address via DHCP from Verizon FiOS, but I get stuck in the

eth1: link up
eth1: link down

loop/race condition. I called Verizon and had them break my DHCP lease and I reset my ONT via removing power/battery for 15 min, but I still get caught in the loop. Like the OP, I'm not familiar enough with the inner workings of OpenWRT to pin point the cause or provide a solution, but if I do figure it out, I'll post my findings.

Mike

comment:8 in reply to: ↑ 7 Changed 4 years ago by mihanson

Replying to mihanson:

I'm seeing a very similar problem using 12.09 AA release image for a Buffalo WZR-600DHP (using the WZR-HP-AG300H image). I'm trying to pull an address via DHCP from Verizon FiOS, but I get stuck in the

eth1: link up
eth1: link down

loop/race condition. I called Verizon and had them break my DHCP lease and I reset my ONT via removing power/battery for 15 min, but I still get caught in the loop. Like the OP, I'm not familiar enough with the inner workings of OpenWRT to pin point the cause or provide a solution, but if I do figure it out, I'll post my findings.

Mike

As a workaround I can disable the WAN interface on boot, then start/enable it after boot and pull a DHCP address from my provider (Verizon).

Network > Interfaces > WAN > Edit > Advanced settings.

comment:9 Changed 4 years ago by yousong

I played with pppd for a while and decided that this should have nothing to do with maxfail 1 or netifd.

Each startup of pppd will try to do PPPoE discovery waiting for PADO packet 3 times, each with a timeout value of 5 seconds. The same goes for PADS packet. This is why Timeout waiting for PADO packets appeared in log 15 seconds later after rp-pppoe.so was loaded. I assumed that this trying and timeout setting should be just enough.

My current guess is that this is related to how the hardware or kernel driver handles ethernet frames whose length is below the minimum requirement of 64 bytes.

The following log shows that pppd sent 3 PADR packets and all of them were timed out without response from PPPoE server. But strange enough is that no PADR packets were ever seen by wireshark from where the PPPoE server was setup.

Wed Oct  9 09:49:04 2013 daemon.debug pppd[2574]: Send PPPOE Discovery V1T1 PADI session 0x0 length 12
Wed Oct  9 09:49:04 2013 daemon.debug pppd[2574]:  dst ff:ff:ff:ff:ff:ff  src TP:LI:NK:WR:74:3N
Wed Oct  9 09:49:04 2013 daemon.debug pppd[2574]:  [service-name] [host-uniq  00 00 0a 0e]
Wed Oct  9 09:49:04 2013 daemon.debug pppd[2574]: Recv PPPOE Discovery V1T1 PADO session 0x0 length 24
Wed Oct  9 09:49:04 2013 daemon.debug pppd[2574]:  dst TP:LI:NK:WR:74:3N  src 0:c:29:49:b5:8b
Wed Oct  9 09:49:04 2013 daemon.debug pppd[2574]:  [host-uniq  00 00 0a 0e] [service-name] [AC-name MikroTik]
Wed Oct  9 09:49:04 2013 daemon.debug pppd[2574]: Send PPPOE Discovery V1T1 PADR session 0x0 length 12
Wed Oct  9 09:49:04 2013 daemon.debug pppd[2574]:  dst 0:c:29:49:b5:8b  src TP:LI:NK:WR:74:3N
Wed Oct  9 09:49:04 2013 daemon.debug pppd[2574]:  [service-name] [host-uniq  00 00 0a 0e]
Wed Oct  9 09:49:06 2013 kern.info kernel: [  859.860000] br-lan: port 2(wlan0) entered forwarding state
Wed Oct  9 09:49:09 2013 daemon.debug pppd[2574]: Send PPPOE Discovery V1T1 PADR session 0x0 length 12
Wed Oct  9 09:49:09 2013 daemon.debug pppd[2574]:  dst 0:c:29:49:b5:8b  src TP:LI:NK:WR:74:3N
Wed Oct  9 09:49:09 2013 daemon.debug pppd[2574]:  [service-name] [host-uniq  00 00 0a 0e]
Wed Oct  9 09:49:14 2013 daemon.debug pppd[2574]: Send PPPOE Discovery V1T1 PADR session 0x0 length 12
Wed Oct  9 09:49:14 2013 daemon.debug pppd[2574]:  dst 0:c:29:49:b5:8b  src TP:LI:NK:WR:74:3N
Wed Oct  9 09:49:14 2013 daemon.debug pppd[2574]:  [service-name] [host-uniq  00 00 0a 0e]
Wed Oct  9 09:49:19 2013 daemon.warn pppd[2574]: Timeout waiting for PADS packets
Wed Oct  9 09:49:19 2013 daemon.err pppd[2574]: Unable to complete PPPoE Discovery
Wed Oct  9 09:49:19 2013 daemon.info pppd[2574]: Exit.
Wed Oct  9 09:49:19 2013 daemon.notice netifd: Interface 'GORG' is now down
Last edited 4 years ago by yousong (previous) (diff)

comment:10 Changed 4 years ago by yousong

This ticket can be closed.

I have tried to repeat the PPPoE connection setup command by hand and noticed that timeout waiting for PADO packet appeared after about 5 connection failures. Then I waited for about another 5 minutes and tried reconnecting, the PPPoE server responded. The timing is far from accurate but generally that is the case.

Passwords for PPPoE accounts are dynamically generated by the local ISP and have an expiration period associated with them. Currently, OpenWrt will try connecting again and again even with the expired password if the connection is not up. So my guess is that the ISP device have blocked my reconnection attempt for a while for the sake of security.

comment:11 Changed 4 years ago by nbd

  • Resolution set to not_a_bug
  • Status changed from new to closed

Add Comment

Modify Ticket

Action
as closed .
The resolution will be deleted. Next status will be 'reopened'.
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.