Modify

Opened 3 years ago

Closed 3 years ago

Last modified 18 months ago

#18946 closed defect (not_a_bug)

DHCPOFFER from dnsmasq-dhcp get lost

Reported by: corny Owned by: developers
Priority: normal Milestone: Barrier Breaker 14.07
Component: packages Version: Barrier Breaker 14.07
Keywords: Cc:

Description

DHCPOFFERs to some wifi-Clients seem to get lost before they reach the network interface on a TL-WDR3600. This is very strage because it works with most clients and the offers also appear in the logging output. Disabling the firewall does not change anything.

I have seen this behaviour with Samsung and Apple devices. Any idea?

Thu Feb  5 21:12:42 2015 daemon.info hostapd: wlan0: STA f4:09:d8:5b:7b:62 IEEE 802.11: authenticated
Thu Feb  5 21:12:42 2015 daemon.info hostapd: wlan0: STA f4:09:d8:5b:7b:62 IEEE 802.11: associated (aid 7)
Thu Feb  5 21:12:45 2015 daemon.info dnsmasq-dhcp[4396]: DHCPDISCOVER(br-guest) f4:09:d8:5b:7b:62 
Thu Feb  5 21:12:45 2015 daemon.info dnsmasq-dhcp[4396]: DHCPOFFER(br-guest) 192.168.10.220 f4:09:d8:5b:7b:62 
Thu Feb  5 21:12:46 2015 daemon.info dnsmasq-dhcp[4396]: DHCPDISCOVER(br-guest) f4:09:d8:5b:7b:62 
Thu Feb  5 21:12:46 2015 daemon.info dnsmasq-dhcp[4396]: DHCPOFFER(br-guest) 192.168.10.220 f4:09:d8:5b:7b:62 
Thu Feb  5 21:12:54 2015 daemon.info dnsmasq-dhcp[4396]: DHCPDISCOVER(br-guest) f4:09:d8:5b:7b:62 
Thu Feb  5 21:12:54 2015 daemon.info dnsmasq-dhcp[4396]: DHCPOFFER(br-guest) 192.168.10.220 f4:09:d8:5b:7b:62 
Thu Feb  5 21:13:10 2015 daemon.info dnsmasq-dhcp[4396]: DHCPDISCOVER(br-guest) f4:09:d8:5b:7b:62 
Thu Feb  5 21:13:10 2015 daemon.info dnsmasq-dhcp[4396]: DHCPOFFER(br-guest) 192.168.10.220 f4:09:d8:5b:7b:62 
Thu Feb  5 21:13:13 2015 daemon.info hostapd: wlan0: STA f4:09:d8:5b:7b:62 IEEE 802.11: disassociated
Thu Feb  5 21:13:14 2015 daemon.info hostapd: wlan0: STA f4:09:d8:5b:7b:62 IEEE 802.11: authenticated
Thu Feb  5 21:13:14 2015 daemon.info hostapd: wlan0: STA f4:09:d8:5b:7b:62 IEEE 802.11: associated (aid 7)
Thu Feb  5 21:13:18 2015 daemon.info dnsmasq-dhcp[4396]: DHCPDISCOVER(br-guest) f4:09:d8:5b:7b:62 
Thu Feb  5 21:13:18 2015 daemon.info dnsmasq-dhcp[4396]: DHCPOFFER(br-guest) 192.168.10.220 f4:09:d8:5b:7b:62 
Thu Feb  5 21:13:19 2015 daemon.info dnsmasq-dhcp[4396]: DHCPDISCOVER(br-guest) f4:09:d8:5b:7b:62 
Thu Feb  5 21:13:19 2015 daemon.info dnsmasq-dhcp[4396]: DHCPOFFER(br-guest) 192.168.10.220 f4:09:d8:5b:7b:62 
Thu Feb  5 21:13:28 2015 daemon.info dnsmasq-dhcp[4396]: DHCPDISCOVER(br-guest) f4:09:d8:5b:7b:62 
Thu Feb  5 21:13:28 2015 daemon.info dnsmasq-dhcp[4396]: DHCPOFFER(br-guest) 192.168.10.220 f4:09:d8:5b:7b:62 
Thu Feb  5 21:13:44 2015 daemon.info dnsmasq-dhcp[4396]: DHCPDISCOVER(br-guest) f4:09:d8:5b:7b:62 
Thu Feb  5 21:13:44 2015 daemon.info dnsmasq-dhcp[4396]: DHCPOFFER(br-guest) 192.168.10.220 f4:09:d8:5b:7b:62 
Thu Feb  5 21:13:45 2015 daemon.info hostapd: wlan0: STA f4:09:d8:5b:7b:62 IEEE 802.11: disassociated
Thu Feb  5 21:13:46 2015 daemon.info hostapd: wlan0: STA f4:09:d8:5b:7b:62 IEEE 802.11: deauthenticated due to inactivity (timer DEAUTH/REMOVE)
...

Attachments (3)

br-guest.cap (72.7 KB) - added by corny 3 years ago.
capture of DHCP packages on br-guest
strace.dnsmasq.log (551.4 KB) - added by _dmke 3 years ago.
strace attached to dnsmasq for a few minutes
logread.while-stracing.log (15.1 KB) - added by _dmke 3 years ago.
output of logread -f of the same period strace was attached to dnsmasq

Download all attachments as: .zip

Change History (16)

Changed 3 years ago by corny

capture of DHCP packages on br-guest

comment:1 Changed 3 years ago by anonymous

I have probably the same issue. I have a TL-WDR3600 router with OpenWrt Chaos Calmer r42926. The 5 Ghz band is always good: if a device (android / iphone/ computer) wants an IP address, it will get one from the router. On 2.4 Ghz this is different. Sometimes it works and an ip address is offered, but sometimes it does not work. I checked the traffic on my computer. It asks for an IP address, but does not get an answer from the router. In the log of the router i recall to see the same output you post. This is from memory, because i did not stored the output.

Usually the 2.4 band works fine for about 2-3 weeks, then it fails. I found out that stopping and starting the 2.4 Ghz band radio helps. I thought i could solve the problem in a dirty way, so i wrote a cron job which restarted the 2.4 band every week. This was fine, until after about 100 day's. Then the 2.4 band started to fail within the week. I rebooted the router recently and am hoping for another 100 days of good Wifi connection. I thought also about finding a better solution than my dirty one. I found this page on my search for this solotion.

Since i recently rebooted i have no log output to verify that i really have the same issue. "Hopefully" it will fail again and then i can post the log.

I see "guest" in your log. I (also?) have a guest and a normal WiFi in both 2.4 and 5 Ghz.

My dirty solution is stored in a file /bin/wifi-restart.sh :

#!/bin/sh
#
#
#restart radio or WiFi netwerk
#
#wireless.@wifi-iface[0] is WiFi-normal-2.4Ghz
#wireless.@wifi-iface[1] is WiFi-guest-2.4Ghz
#wireless.@wifi-iface[2] is WiFi-normal-5Ghz
#wireless.@wifi-iface[3] is WiFi-guest-5Ghz
#
#
uci set wireless.@wifi-iface[0].disabled=1 #WiFi-normal-2.4Ghz
wifi #commit change
sleep 10
uci set wireless.@wifi-iface[0].disabled=0 #WiFi-normal-2.4Ghz on
wifi #commit change
sleep 10
uci set wireless.@wifi-iface[1].disabled=1 #WiFi-guest-2.4Ghz off
wifi #commit change
sleep 10
uci set wireless.@wifi-iface[1].disabled=0 #WiFi-guest-2.4Ghz on
wifi #commit change

The cron job is:

#restart WiFi-normal-2.4Ghz and WiFi-guest-2.4Ghz at 5:30 every Friday:
30 5 * * 5 /bin/wifi-restart.sh



comment:2 Changed 3 years ago by anonymous

Me again after some more search....

See /ticket/18130.html
I see "Fix committed in r43011"

Maybe the solution is already in Trunck....

comment:3 Changed 3 years ago by corny

Thanks for telling your experiences. I am only running a guest network at 2.4 and 5 GHz. The problems also occured after a fresh reboot and it looks as if only few specific mac addresses are affected. I will take a look at the referenced tickets

comment:4 Changed 3 years ago by anonymous

I was looking for a solution, because with me rebooting did not help either.
It was good for about 100 days before. Now it broke after one or two day's.

I have switched to r44381
If this does not help me, i will let you know
If Wifi is stable for more than 3 weeks, i hope to think about this ticket again. Then i will post my experience.

comment:5 Changed 3 years ago by anonymous

I was looking for a solution, because with me rebooting did not help either.
It was good for about 100 days before. Now it broke after one or two day's.

I have switched to r44381
If this does not help me, i will let you know
If Wifi is stable for more than 3 weeks, i hope to think about this ticket again. Then i will post my experience.

comment:6 Changed 3 years ago by corny

I upgraded from Kernel 3.10.49 to 3.10.69, but the problem still exists.
It occurs just after a fresh sysupgrade with exactly the same clients.
I am going to test the recent trunk.

Last edited 3 years ago by corny (previous) (diff)

comment:7 Changed 3 years ago by angelmanjk@…

having the same problem with a wndr3800.
only after i disable wifi complete its stable (i have an extra wifi-ap).
if i enable wifi (5 GHz or 2.4 GHz or both) after some unspecific time a lot of dhcp discover/offer messages appear but never a dhcp-ack

comment:8 Changed 3 years ago by _dmke

I've captured this strace for dnsmasq, when a DHCP DISCOVER message comes in and an OFFER is written out (opkg install strace, and strace -p <pidof dnsmasq>):

_newselect(11, [4 5 6 7 8 9 10], [], [], NULL) = 1 (in [4])
time(NULL)                              = 1424705700
stat64("/tmp/resolv.conf.auto", {st_mode=S_IFREG|0644, st_size=41, ...}) = 0
recvmsg(4, {msg_name(16)={sa_family=AF_INET, sin_port=htons(68), sin_addr=inet_addr("0.0.0.0")}, msg_iov(1)=[{"\1\1\6\0M\367\223\27\0\2\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\200\276\5j"..., 1472}], msg_controllen=24, {cmsg_len=24, cmsg_level=SOL_IP, cmsg_type=, ...}, msg_flags=0}, MSG_PEEK|MSG_TRUNC) = 300
recvmsg(4, {msg_name(16)={sa_family=AF_INET, sin_port=htons(68), sin_addr=inet_addr("0.0.0.0")}, msg_iov(1)=[{"\1\1\6\0M\367\223\27\0\2\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\200\276\5j"..., 1472}], msg_controllen=24, {cmsg_len=24, cmsg_level=SOL_IP, cmsg_type=, ...}, msg_flags=0}, 0) = 300
ioctl(4, SIOCGIFNAME, {ifr_index=5, ifr_name="br-guest"}) = 0
ioctl(4, SIOCGIFADDR, {ifr_name="br-guest", ifr_addr={AF_INET, inet_addr("192.168.8.1")}}) = 0
sendto(5, "\0\0\0\24\0\26\3\5\0\0JK\0\0\0\0\2\0\0\0", 20, 0, {sa_family=AF_NETLINK, pid=0, groups=00000000}, 12) = 20
recvmsg(5, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"\0\0\0D\0\24\0\2\0\0JK\0\0\5s\2\10\200\376\0\0\0\1\0\10\0\1\177\0\0\1"..., 512}], msg_controllen=0, msg_flags=0}, MSG_PEEK|MSG_TRUNC) = 392
recvmsg(5, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"\0\0\0D\0\24\0\2\0\0JK\0\0\5s\2\10\200\376\0\0\0\1\0\10\0\1\177\0\0\1"..., 512}], msg_controllen=0, msg_flags=0}, 0) = 392
recvmsg(5, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"\0\0\0\24\0\3\0\2\0\0JK\0\0\5s\0\0\0\0", 512}], msg_controllen=0, msg_flags=0}, MSG_PEEK|MSG_TRUNC) = 20
recvmsg(5, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"\0\0\0\24\0\3\0\2\0\0JK\0\0\5s\0\0\0\0", 512}], msg_controllen=0, msg_flags=0}, 0) = 20
time([1424705700])                      = 1424705700
open("/etc/TZ", O_RDONLY)               = 13
read(13, "UTC\n", 68)                   = 4
close(13)                               = 0
write(12, "<30>Feb 23 15:35:00 dnsmasq-dhcp"..., 81) = 81
time([1424705700])                      = 1424705700
open("/etc/TZ", O_RDONLY)               = 13
read(13, "UTC\n", 68)                   = 4
close(13)                               = 0
write(12, "<30>Feb 23 15:35:00 dnsmasq-dhcp"..., 92) = 92
alarm(2244)                             = 2248
ioctl(4, SIOCSARP, 0x7fa77618)          = 0
sendmsg(4, {msg_name(16)={sa_family=AF_INET, sin_port=htons(68), sin_addr=inet_addr("192.168.10.76")}, msg_iov(1)=[{"\2\1\6\0M\367\223\27\0\2\0\0\0\0\0\0\300\250\nL\300\250\10\1\0\0\0\0\200\276\5j"..., 300}], msg_controllen=0, msg_flags=0}, 0) = 300

Near the end, you can see the write's to syslog (write(12, "<30>Feb 23 15:35:00 dnsmasq-dhcp"):

Mon Feb 23 15:35:00 2015 daemon.info dnsmasq-dhcp[1395]: DHCPDISCOVER(br-guest) 80:be:05:6a:69:ed 
Mon Feb 23 15:35:00 2015 daemon.info dnsmasq-dhcp[1395]: DHCPOFFER(br-guest) 192.168.10.76 80:be:05:6a:69:ed 

and at the very bottom, you'll see 300 bytes is written to fd 4:

sendmsg(4, {msg_name(16)={sa_family=AF_INET, sin_port=htons(68), sin_addr=inet_addr("192.168.10.76")}, msg_iov(1)=[{"\2\1\6\0M\367\223\27\0\2\0\0\0\0\0\0\300\250\nL\300\250\10\1\0\0\0\0\200\276\5j"..., 300}], msg_controllen=0, msg_flags=0}, 0) = 300

I've traced that sendmsg syscall to void dhcp_packet(time_t,int) (src/dhcp.c:445, http://www.thekelleys.org.uk/dnsmasq/dnsmasq-2.71.tar.gz) to confirm that dnsmasq tries really, really hard the send the message (cf. int retry_send() in src/util.c:546 for reference).

Capturing those 300 bytes with tcpdump was not possible, though—as if the kernel never received that package.

I'm now a little puzzled. Do you have any further ideas?

—Dominik

Changed 3 years ago by _dmke

strace attached to dnsmasq for a few minutes

Changed 3 years ago by _dmke

output of logread -f of the same period strace was attached to dnsmasq

comment:9 Changed 3 years ago by anonymous

are your mac adresses the same on the interfaces?

comment:10 Changed 3 years ago by corny

We finally solved the problem. The router got a 192.168.10.0/24 address by DHCP from the uplink network. Our guest network was 192.168.8.0/22 that conflicts with the uplink network. Dnsmasq uses the clients MAC address to find a unused IP address. There was a probability of 0.25 that the offered IP address also belongs to the uplink network. In these cases the DNS-Offer did not reach the client.

The router was installed far away at our customer. We did not know his network and checked everything except the routers own DHCP lease.

comment:11 follow-up: Changed 3 years ago by anonymous

solved too. it was the internal switch where aging and learning was active. switching it off resolved my problems.

comment:12 Changed 3 years ago by jow

  • Resolution set to not_a_bug
  • Status changed from new to closed

comment:13 in reply to: ↑ 11 Changed 18 months ago by erniusias@…

Replying to anonymous:I get the same problem; Can you teach me how to disable learning an aging of switch.

solved too. it was the internal switch where aging and learning was active. switching it off resolved my problems.

Add Comment

Modify Ticket

Action
as closed .
The resolution will be deleted. Next status will be 'reopened'.
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.