Modify

Opened 3 years ago

Last modified 3 years ago

#17953 new defect

netifd - hotplug ifdown events and behaving differently in VM (NIC and vNIC link loss)

Reported by: anonymous Owned by: developers
Priority: normal Milestone:
Component: packages Version: Trunk
Keywords: netifd Cc:

Description

Two issues I've come across while testing OpenWrt (CC trunk and BB-14.07 HEAD) + mwan3 in a VirtualBox VM v4.3.16 (x86):

For details, please check today's discussion between Adze and kpv at https://forum.openwrt.org/viewtopic.php?id=39052&p=37

In a VM, BEFORE disconnecting wan / eth1:

root@OpenWrt:~# ip route list table 1 default dev eth1
default via 10.0.3.1
root@OpenWrt:~#

whereas immediately after disconnecting wan / eth1 (unchecking of "Cable Connected") this route gets deleted:

root@OpenWrt:~# ip route list table 1 default dev eth1
root@OpenWrt:~#

However, in case of a physical TP-LINK TL-WDR4300 router, that route isn't flushed.

In VM the "unplugged" NIC:

root@OpenWrt:~# ip link sh
...
7: eth1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc hfsc state DOWN mode DEFAULT group default qlen 5
    link/ether 08:00:27:c1:de:6d brd ff:ff:ff:ff:ff:ff

root@OpenWrt:~# ifconfig
...
eth1      Link encap:Ethernet  HWaddr 08:00:27:C1:DE:6D
          inet6 addr: fe80::a00:27ff:fec1:de6d/64 Scope:Link
          UP BROADCAST MULTICAST  MTU:1500  Metric:1
          RX packets:611459 errors:0 dropped:0 overruns:0 frame:0
          TX packets:336570 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:5
          RX bytes:808462195 (771.0 MiB)  TX bytes:25230266 (24.0 MiB)

netifd - 2014-09-08-46c569989f984226916fec28dd8ef152a664043e
OpenWrt BB 14.07 r42625

Attachments (0)

Change History (22)

comment:1 Changed 3 years ago by anonymous

Just wanted to add that all vNICs of the OpenWrt x86 VM are defined as virtio-net (paravirtualized). It's running under VirtualBox 4.3.16.

comment:2 Changed 3 years ago by jeroen.louwes@…

The emphasis should be on the fact that no hotplug ifdown event is triggered after link loss. The fact that routes are removed for no longer valid (down) interfaces, is imho correct.

comment:3 Changed 3 years ago by anonymous

Can you display the complete network configuration ?

comment:4 Changed 3 years ago by jeroen.louwes@…

Here is my test config. It is very basic:

root@OpenWrt:~# cat /etc/config/network 

config interface 'loopback'
	option ifname 'lo'
	option proto 'static'
	option ipaddr '127.0.0.1'
	option netmask '255.0.0.0'

config interface 'lan'
	option ifname 'eth0.1'
	option type 'bridge'
	option proto 'static'
	option ipaddr '192.168.1.1'
	option netmask '255.255.255.0'

config interface 'wan'
	option ifname 'eth1'
	option proto 'static'
	option ipaddr '192.168.33.14'
	option netmask '255.255.255.240'
	option gateway '192.168.33.1'
	option dns '8.8.8.8'
	option metric '10'

config interface 'wan2'
	option ifname 'eth0.2'
	option proto 'static'
	option ipaddr '192.168.49.50'
	option netmask '255.255.255.0'
	option gateway '192.168.49.1'
	option dns '8.8.4.4'
	option metric '20'

config switch
	option name 'switch0'
	option reset '1'
	option enable_vlan '1'
	option blinkrate '2'

config switch_vlan
	option device 'switch0'
	option vlan '1'
	option ports '0 1 3 5t'

config switch_vlan
	option device 'switch0'
	option vlan '2'
	option ports '2 5t'

comment:5 Changed 3 years ago by dedeckeh@…

For interfaces with proto static linksensing is disabled by default in netifd; as a consequence netifd does not bring the interface down when the link is removed and no ifdown event is generated. You can overrule this behavior by setting the force_link UCI parameter to 0 for the interface (eg uci set network.wan.force_link=0) which will enable linksensing for the interface in netifd.

comment:6 follow-up: Changed 3 years ago by anonymous

But in all my tests (note: I opened this ticket) the WAN interfaces on my two test systems (x86 VM and TL-WDR4300) are defined as proto 'dhcp', not 'static' ...

From my test VM running OpenWrt x86 BB r42625

root@OpenWrt:/# cat /etc/config/network

config interface 'loopback'
        option ifname 'lo'
        option proto 'static'
        option ipaddr '127.0.0.1'
        option netmask '255.0.0.0'

config interface 'lan'
        option ifname 'eth0'
#       option type 'bridge'
        option proto 'static'
        option ipaddr '192.168.100.1'
        option netmask '255.255.252.0'
        option ip6assign '60'

config interface 'wan'
        option ifname 'eth1'
        option proto 'dhcp'
        option hostname 'openwrt'
        option defaultroute '1'
        option metric '10'

config interface 'wan2'
        option ifname 'eth2'
        option proto 'dhcp'
        option mtu '1508'
        option defaultroute '1'
        option metric '20'

#config interface 'wan6'
#       option ifname '@wan'
#       option proto 'dhcpv6'

config interface 'vpn1'
        option ifname 'tun1'
        option proto 'none'

config globals 'globals'
        option ula_prefix 'fd65:d55b:92fb::/48'

...

Could the different behaviour of netifd upon WAN NIC phy link loss be because the eth1 and eth2 interfaces on the x86 VM are considered "separate" devices, whereas in the case of physical WDR4300 the two WANs are actually sitting on the same switch (eth0.2 and eth0.3) ?

comment:7 follow-up: Changed 3 years ago by anonymous

I tried force_link=0 and indeed i now do see a hotplug event on link loss. So that fixes my problem. I noticed that the DEVICE variable is not set. Only ACTION and INTERFACE are set. Is this as designed?

comment:8 in reply to: ↑ 7 Changed 3 years ago by dedeckeh@…

Replying to anonymous:

I tried force_link=0 and indeed i now do see a hotplug event on link loss. So that fixes my problem. I noticed that the DEVICE variable is not set. Only ACTION and INTERFACE are set. Is this as designed?

This is indeed as designed; only for ifup and ifupdate actions the interface and the device is set.

comment:9 in reply to: ↑ 6 Changed 3 years ago by dedeckeh@…

Replying to anonymous:

But in all my tests (note: I opened this ticket) the WAN interfaces on my two test systems (x86 VM and TL-WDR4300) are defined as proto 'dhcp', not 'static' ...

From my test VM running OpenWrt x86 BB r42625

root@OpenWrt:/# cat /etc/config/network

config interface 'loopback'
        option ifname 'lo'
        option proto 'static'
        option ipaddr '127.0.0.1'
        option netmask '255.0.0.0'

config interface 'lan'
        option ifname 'eth0'
#       option type 'bridge'
        option proto 'static'
        option ipaddr '192.168.100.1'
        option netmask '255.255.252.0'
        option ip6assign '60'

config interface 'wan'
        option ifname 'eth1'
        option proto 'dhcp'
        option hostname 'openwrt'
        option defaultroute '1'
        option metric '10'

config interface 'wan2'
        option ifname 'eth2'
        option proto 'dhcp'
        option mtu '1508'
        option defaultroute '1'
        option metric '20'

#config interface 'wan6'
#       option ifname '@wan'
#       option proto 'dhcpv6'

config interface 'vpn1'
        option ifname 'tun1'
        option proto 'none'

config globals 'globals'
        option ula_prefix 'fd65:d55b:92fb::/48'

...

Could the different behaviour of netifd upon WAN NIC phy link loss be because the eth1 and eth2 interfaces on the x86 VM are considered "separate" devices, whereas in the case of physical WDR4300 the two WANs are actually sitting on the same switch (eth0.2 and eth0.3) ?

Netifd relies on netlink events from the Linux kernel indicating link loss/presence (LOWER_UP flag in ip link show). I'm wondering if these events are generated by the Linux kernel on the X86 VM; you can easily check this by calling ubus call network.device status and checking the carrier state for the eth1 and eth2 devices if the WAN NIC phy link is lost (This should also be visible in ip link show as the NO_CARRIER flag should be displayed in case of link lost)

comment:10 Changed 3 years ago by anonymous

Indeed it does set NO_CARRIER, as I noted in the initial post:

The "unplugged" NIC in VM:

root@OpenWrt:~# ip link sh
...
7: eth1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc hfsc state DOWN mode DEFAULT group default qlen 5
    link/ether 08:00:27:c1:de:6d brd ff:ff:ff:ff:ff:ff

comment:11 Changed 3 years ago by dedeckeh@…

Can you check the output of ubus call network.device status ?

comment:12 Changed 3 years ago by anonymous

OpenWrt BB 14.07 r42625 (x86) running under VBox 4.3.16

BEFORE disconnecting wan / eth1:

        "eth1": {
                "external": false,
                "present": true,
                "type": "Network device",
                "up": true,
                "carrier": true,
...

immediately after disconnecting wan / eth1 (unchecking of "Cable Connected")

        "eth1": {
                "external": false,
                "present": true,
                "type": "Network device",
                "up": true,
                "carrier": false,
...

so this one also seems correct.

comment:13 Changed 3 years ago by anonymous

Can you check additionally the output of ifstatus wan before and after disconnecting the wan ?

comment:14 follow-up: Changed 3 years ago by anonymous

Before:

root@OpenWrt:~# ifstatus wan
{
        "up": true,
        "pending": false,
        "available": true,
        "autostart": true,
        "uptime": 1820,
        "l3_device": "eth1",
        "proto": "dhcp",
        "device": "eth1",
        "metric": 10,
        "delegation": true,
        "ipv4-address": [
                {
                        "address": "10.0.3.2",
                        "mask": 24
                }
        ],
...

After

root@OpenWrt:~# ifstatus wan
{
        "up": false,
        "pending": false,
        "available": true,
        "autostart": true,
        "proto": "dhcp",
        "device": "eth1",
        "data": {

        }
}
root@OpenWrt:~#

comment:15 in reply to: ↑ 14 Changed 3 years ago by dedeckeh@…

Replying to anonymous:

Before:

root@OpenWrt:~# ifstatus wan
{
        "up": true,
        "pending": false,
        "available": true,
        "autostart": true,
        "uptime": 1820,
        "l3_device": "eth1",
        "proto": "dhcp",
        "device": "eth1",
        "metric": 10,
        "delegation": true,
        "ipv4-address": [
                {
                        "address": "10.0.3.2",
                        "mask": 24
                }
        ],
...

After

root@OpenWrt:~# ifstatus wan
{
        "up": false,
        "pending": false,
        "available": true,
        "autostart": true,
        "proto": "dhcp",
        "device": "eth1",
        "data": {

        }
}
root@OpenWrt:~#

Ok this behavior on the x86 VM is as expected but the behavior on the WDR4300 differs. As stated in your previous post this could be related to eth0.2 and eth0.3 wan devices on the same switch (assuming the protocol differs from static). Executing the same ubus command and ifstatus cmds should give an indication if the switch propagates the link lost event to netifd

comment:16 follow-up: Changed 3 years ago by anonymous

But if I understood your comments, setting force_link=0 will produce them same behavior on the WDR4300, right ?

comment:17 in reply to: ↑ 16 Changed 3 years ago by dedeckeh@…

Replying to anonymous:

But if I understood your comments, setting force_link=0 will produce them same behavior on the WDR4300, right ?

If the proto parameter of the wan interfaces is static you definitely need to set force_link to 0 to detect link loss in netifd (for all other proto values force_link is set to 0 by default). But still I'm puzzled how link lost is handled in the linux kernel if eth0.2 and eth0.3 as wan interfaces are on the same switch.

comment:18 follow-ups: Changed 3 years ago by anonymous

Some quick testing suggests that there's no change on the WDR4300 upon link-loss.

Maybe it needs a reboot?

dedeckeh, could you please elaborate about how to permanently enable the force_link=0 setting in /etc/config/network ?

comment:19 in reply to: ↑ 18 Changed 3 years ago by anonymous

Replying to anonymous:

Some quick testing suggests that there's no change on the WDR4300 upon link-loss.

Well, testing on a WDR4300, I can't seem to find any differences in the output of

ifstatus wan
ip link sh dev eth0.2
ubus call network.device status

before and after pulling out the cable from its WAN port (eth0.2).

comment:20 in reply to: ↑ 18 Changed 3 years ago by dedeckeh@…

Replying to anonymous:

Some quick testing suggests that there's no change on the WDR4300 upon link-loss.

Maybe it needs a reboot?

dedeckeh, could you please elaborate about how to permanently enable the force_link=0 setting in /etc/config/network ?

The force_link parameter can be set via uci (uci set network.wan.force_link=0)

comment:21 Changed 3 years ago by jow

Link state changes on switch ports are not propagated at all since the parent iface (e.g. eth0 in this case) never loose the link. The only solution for now is to have some polling mechanism that watches the swconfig port state (or its corresponding netlink api). An eth0.X should be considered down if all ports except the cpu port lost their link.

comment:22 Changed 3 years ago by anonymous

I noticed that mwan3 prints the device "unknown" in the ifdown event from netifd "... ifdown interface wan (unknown)" which presumably is due to netifd not setting DEVICE variable:

Thu Sep 25 20:59:06 2014 daemon.notice netifd: Network device 'eth1' link is down
Thu Sep 25 20:59:06 2014 daemon.notice netifd: Interface 'wan' has link connectivity loss
Thu Sep 25 20:59:06 2014 daemon.notice netifd: wan (2080): Received SIGTERM
Thu Sep 25 20:59:06 2014 user.notice mwan3: ifdown interface wan (unknown)
Thu Sep 25 20:59:42 2014 user.notice mwan3track: Interface wan (eth1) is offline
Thu Sep 25 20:59:43 2014 user.notice mwan3: ifdown interface wan (eth1)

Is this something that can be "fixed" in netifd?

Add Comment

Modify Ticket

Action
as new .
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.