Modify

Opened 3 years ago

Last modified 2 years ago

#17744 reopened defect

aiccu doesn't start on boot due to DNS and time issues

Reported by: lorenzo@… Owned by: developers
Priority: normal Milestone:
Component: packages Version: Trunk
Keywords: Cc:

Description

I'm using aiccu to set up an IPv6-in-IPv4 tunnel on barrier breaker 14.07-rc3. Unfortunately it doesn't work when the box boots, for the two reasons below.

The fact that it doesn't start on boot is particularly bad due to sixxs', let's say, aggressive stance in banning clients that auto-reconnect, which means that aiccu.sh never retries.

Reason 1 is because aiccu.sh is started as soon as the WAN interface comes up, but in order for aiccu to start it needs to do a DNS lookup for tic.sixxs.net, and the DNS servers aren't configured yet. (DHCP has returned them, but dnsmasq hasn't been reconfigured yet).

Tue Sep 2 14:29:06 2014 daemon.notice netifd: wan (1264): Lease of 100.96.57.160 obtained, lease time 14400
Tue Sep 2 14:29:06 2014 daemon.notice netifd: Interface 'wan6' is setting up now
Tue Sep 2 14:29:07 2014 daemon.info named[955]: error (unexpected RCODE REFUSED) resolving 'tic.sixxs.net/A/IN': 127.0.0.1#1053
Tue Sep 2 14:29:07 2014 local7.err syslog: Couldn't resolve host tic.sixxs.net, service 3874
Tue Sep 2 14:29:07 2014 local7.err syslog: Couldn't connect to the TIC server tic.sixxs.net
Tue Sep 2 14:29:07 2014 local7.err syslog: Couldn't retrieve first tunnel for the above reason, aborting

This is with dnsmasq, or local named forwarding to local dnsmasq on a different port.

===

Reason 2 is that the first time the box boots its time is not synced, and sixxs doesn't like that:

Tue Sep 2 15:03:22 2014 local7.debug syslog: sock_getline() : "200 SixXS TIC Service on nlhaa01.sixxs.net ready (http://www.sixxs.net)"
Tue Sep 2 15:03:22 2014 local7.debug syslog: sock_printf() : "client TIC/draft-00 AICCU/2007.01.15-console-linux Linux/3.10.49"
Tue Sep 2 15:03:22 2014 local7.debug syslog: sock_getline() : "200 Client Identity accepted"
Tue Sep 2 15:03:22 2014 local7.debug syslog: sock_printf() : "get unixtime"
Tue Sep 2 15:03:22 2014 local7.debug syslog: sock_getline() : "200 1409695531"
Tue Sep 2 15:03:22 2014 local7.err syslog: The clock is off by 129 seconds, use NTP to sync it!
Tue Sep 2 15:03:22 2014 local7.debug syslog: sock_printf() : "QUIT Aborting: Clock is off by 129 seconds
"

A solution to both of these is to put a sleep into aiccu.sh, but that's pretty hacky. A better solution would be to add a "interface completely configured" dependency that would include DNS as well, and a "NTP synced" dependency.

Here's what I have in /etc/config/network:

config interface 'wan6'

option ifname '@wan'
option _orig_ifname '@wan'
option _orig_bridge 'false'
option proto 'aiccu'
option username '*'
option password '
*'
option verbose 'true'
option defaultroute 'true'
option ip6prefix '*'
option autostart 'true'

Attachments (0)

Change History (11)

comment:1 Changed 3 years ago by LipkeGu

in /lib/netifd/proto/aiccu.sh add

/usr/sbin/ntpdate -b <ip of an ntp-server>
sleep 10
....
	[ "$deamonize" == 1 ] && echo "daemonize true" >> "$CFGFILE"
	echo "pidfile $PIDFILE"   >> "$CFGFILE"

# here -->	 
		
	 aiccu start "$CFGFILE"

	[ "$?" -ne 0 ] && {
		proto_notify_error "$cfg" "AICCU_FAILED_SEE_LOG"
		proto_block_restart "$cfg"
		return
	}
...

the problem is netifd sets up the protos to0 early (BEFORE DNS)! this "ntpdate-line should fix that "rudimentary" ... BUT YOU HAVE TO USE AN IP ADRESS !:)

https://www.sixxs.net/forum/?msg=setup-12255745-12256213

Last edited 3 years ago by LipkeGu (previous) (diff)

comment:2 Changed 3 years ago by anonymous

Yes, that does work. Another solution is to run "ntpdate && break; sleep 1" in a loop while DNS comes up.

Neither is a real fix though.

comment:3 Changed 3 years ago by LipkeGu

aiccu doesn't start on boot due to DNS and time issues

simply do a single shot of nslookup and note the ip that is pointing to "tic.sixxs.net":
and use that IP in your config!

root@WNDR3700: nslookup tic.sixxs.net

comment:4 Changed 3 years ago by lorenzo@…

That's also not a real fix, because it will break when SIXXS changes the IP address of their server.

comment:5 Changed 3 years ago by poranje

or another work-around that is not a real fix ...

# work-around for https://dev.openwrt.org/ticket/17744
        local try=0
        local max=10
        while [ $((++try)) -le $max ]; do
                nslookup tic.sixxs.net >/dev/null 2>&1 && break
                sleep 6
        done
        [ $try -gt $max ] || ntpd -qn -p  pool.ntp.org >/dev/null 2>&1
# end of work-around
Last edited 3 years ago by poranje (previous) (diff)

comment:6 Changed 3 years ago by duvi

Aiccu doesn't restart for me either after the ipv4 wan connection reconnects.
Sometimes it doesn't connect on boot because of the same reason.
"Couldn't resolve host tic.sixxs.net"

Sat Sep 20 11:32:12 2014 daemon.info pppd[1656]: LCP terminated by peer (Connect time expired)
Sat Sep 20 11:32:12 2014 daemon.info pppd[1656]: Connect time 10080.0 minutes.
Sat Sep 20 11:32:12 2014 daemon.info pppd[1656]: Sent 3413231816 bytes, received 808504468 bytes.
Sat Sep 20 11:32:12 2014 daemon.err miniupnpd[2487]: Failed to get IP for interface pppoe-wan
Sat Sep 20 11:32:12 2014 daemon.warn miniupnpd[2487]: SendNATPMPPublicAddressChangeNotification: cannot get public IP address, stopping
Sat Sep 20 11:32:12 2014 daemon.notice netifd: Network device 'pppoe-wan' link is down
Sat Sep 20 11:32:12 2014 daemon.notice netifd: Interface 'wan' has lost the connection
Sat Sep 20 11:32:12 2014 daemon.notice netifd: Interface 'wan6' has lost the connection
Sat Sep 20 11:32:12 2014 daemon.notice netifd: Network device 'aiccu-wan6' link is down
Sat Sep 20 11:32:12 2014 user.info syslog: Usage:
        modprobe module
Sat Sep 20 11:32:12 2014 user.info syslog: Usage:
        modprobe module
Sat Sep 20 11:32:12 2014 local7.err syslog: Already running instance HUP'ed, exiting
Sat Sep 20 11:32:12 2014 daemon.notice netifd: Interface 'wan6' is now down
Sat Sep 20 11:32:12 2014 daemon.notice netifd: Interface 'wan6' is setting up now
Sat Sep 20 11:32:12 2014 user.info syslog: Usage:
        modprobe module
Sat Sep 20 11:32:12 2014 user.info syslog: Usage:
        modprobe module
Sat Sep 20 11:32:12 2014 local7.err syslog: Couldn't resolve host tic.sixxs.net, service 3874
Sat Sep 20 11:32:12 2014 local7.err syslog: Couldn't connect to the TIC server tic.sixxs.net
Sat Sep 20 11:32:12 2014 local7.err syslog: Couldn't retrieve first tunnel for the above reason, aborting
Sat Sep 20 11:32:12 2014 daemon.notice netifd: Interface 'wan6' is now down
Sat Sep 20 11:32:15 2014 daemon.notice pppd[1656]: Connection terminated.
Sat Sep 20 11:32:15 2014 daemon.notice pppd[1656]: Modem hangup
Sat Sep 20 11:32:16 2014 daemon.warn dnsmasq[1988]: no servers found in /tmp/resolv.conf.auto, will retry
Sat Sep 20 11:32:45 2014 daemon.info pppd[1656]: PPP session is 1065
Sat Sep 20 11:32:45 2014 daemon.warn pppd[1656]: Connected to 00:25:90:e5:5e:a9 via interface eth1.2
Sat Sep 20 11:32:45 2014 daemon.info pppd[1656]: Using interface pppoe-wan
Sat Sep 20 11:32:45 2014 daemon.notice pppd[1656]: Connect: pppoe-wan <--> eth1.2
Sat Sep 20 11:32:48 2014 daemon.notice pppd[1656]: PAP authentication succeeded
Sat Sep 20 11:32:48 2014 daemon.notice pppd[1656]: peer from calling number 00:25:90:E5:5E:A9 authorized
Sat Sep 20 11:32:48 2014 daemon.notice pppd[1656]: local  IP address x.x.x.x
Sat Sep 20 11:32:48 2014 daemon.notice pppd[1656]: remote IP address x.x.x.x
Sat Sep 20 11:32:48 2014 daemon.notice pppd[1656]: primary   DNS address x.x.x.x
Sat Sep 20 11:32:48 2014 daemon.notice pppd[1656]: secondary DNS address x.x.x.x
Sat Sep 20 11:32:48 2014 daemon.notice netifd: Network device 'pppoe-wan' link is up
Sat Sep 20 11:32:48 2014 daemon.notice netifd: Interface 'wan6' is setting up now
Sat Sep 20 11:32:48 2014 daemon.notice netifd: Interface 'wan' is now up
Sat Sep 20 11:32:48 2014 user.info syslog: Usage:
        modprobe module
Sat Sep 20 11:32:48 2014 user.info syslog: Usage:
        modprobe module
Sat Sep 20 11:32:48 2014 local7.err syslog: Couldn't resolve host tic.sixxs.net, service 3874
Sat Sep 20 11:32:48 2014 local7.err syslog: Couldn't connect to the TIC server tic.sixxs.net
Sat Sep 20 11:32:48 2014 local7.err syslog: Couldn't retrieve first tunnel for the above reason, aborting
Sat Sep 20 11:32:48 2014 user.notice firewall: Reloading firewall due to ifup of wan (pppoe-wan)
Sat Sep 20 11:32:50 2014 daemon.info dnsmasq[1988]: reading /tmp/resolv.conf.auto
Sat Sep 20 11:32:50 2014 daemon.info dnsmasq[1988]: using local addresses only for domain lan
Sat Sep 20 11:32:50 2014 daemon.info dnsmasq[1988]: using nameserver x.x.x.x#53
Sat Sep 20 11:32:50 2014 daemon.info dnsmasq[1988]: using nameserver x.x.x.x#53
Sat Sep 20 11:32:50 2014 daemon.notice miniupnpd[2487]: shutting down MiniUPnPd
Sat Sep 20 11:32:50 2014 daemon.notice miniupnpd[18333]: HTTP listening on port 5000
Sat Sep 20 11:32:50 2014 daemon.notice miniupnpd[18333]: HTTP IPv6 address given to control points : [2a01:368:xxxx::1]
Sat Sep 20 11:32:50 2014 daemon.notice miniupnpd[18333]: Listening for NAT-PMP/PCP traffic on port 5351

comment:7 Changed 3 years ago by luizluca@…

For the ntp problem, I have a patch that might solve the problem. I submitted it long time ago but without answer

http://patchwork.openwrt.org/patch/2886/

The updated version is at:

https://github.com/luizluca/openwrt/compare/trunk/ntp-hotplug

This adds hotplug event when ntp sync. So, aiccu can be trigged to connect once ntp is synced.

If we sum this with someway to check that dnsmasq is alrealdy answering, we can bring aiccu up only when everything is ok.

I really prefer to just try to reconnect when these non-high-frequency-repeatable-events occurs (ifup wan, ntp sync and dnsmasq is up). We can even setup some type of limit in order to never retry to reconnect for the same event (or within a "long" period of time as 10m)

comment:8 Changed 3 years ago by cyrus

  • Resolution set to fixed
  • Status changed from new to closed

comment:9 Changed 3 years ago by adrian@…

  • Resolution fixed deleted
  • Status changed from closed to reopened

The proposed fix resolves the NTP issue but the DNS condition in the narrative still exists. I can see the tunnel trying to come up before the DNS is ready. tic.sixxs.net is not getting resolved. Manually stopping and restarting the interface resolves the issue.

comment:10 Changed 3 years ago by LipkeGu

It seems the Init-script order have to be changed so that dns is ealier up & running before aiccu is starting...

comment:11 Changed 2 years ago by gareththered

The init-script order on my 14.07 box has START=60 for dnsmasq and START=98 for aiccu and I haven't encountered the DNS issue. However, the time sync issue exists. I added:

busybox ntpd -qn -p pool.ntp.org >/dev/null

to start() within /etc/init.d/aiccu in order to pause the startup until NTP had done a one-off timesync. It seems to work. ntpd is part of the busybox whereas ntpclient is not.

Add Comment

Modify Ticket

Action
as reopened .
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.