[ar71xx] iptables sometimes just don't masquerade (nat)

I have a strange problem. After some time (couple of days, a week, sometimes ealier) iptables just stop to masquerade selected connections. Happend to me on different hardware, diffrent openwrt revisions, different sites and configurations. I can't trace what is the root of the problem.

It happens at most on sip register packets originating from source port udp/5060, destination udp/5060.

Cap from pppoe interface attached below (but it happend to me also on simple ethernet configuration without pppoe). You can see there that AudioCodes register packets are not masqueraded, while Snom's register packets are indeed masqueraded. Both devices uses udp/5060 as source port. (Before reboot I have changed ip of the lan interface and restart the network to see if it changes anything - caps attached).

Most of the time after reboot everything is back to normal, masquerading again works as expected. Sometimes however reboots does not help nor change anything, then only reflashing router helps...

Caps are from r31182, device is TL-MR3220. But as said earlier it happens to me on lower openwrt revisions too, and on different hardware, eg. TL-WR1043ND, TL-WR842ND.

Any hints?

On '002' there is an error in file description - Audicodes device already switched to 172 subnet. My bad.

Happened again after about 6-7 days uptime. Same thing - AudioCodes connections was not masqueraded while Snom's connections was masqueraded. Reboot helped. How to debug this error? Someone got any hints? I can supply logs, configs, caps, etc.

I think I found the root of the problem: "NAT Implementation Problems - Linux kernel". You can read about it here:

It looks like clearing NAT tables after WAN link (re)establishment should help. Would openwrt devs try to workaround this problem somehow or should I just forget about it and use SNAT instead of MASQUERADE?

iptables -t nat -A zone_wan_nat -j SNAT --to-source <wan_ip>

instead of

iptables -t nat -A zone_wan_nat -j MASQUERADE

doesn't work either.

The problem is indeed with conntrack but it might be linked with the firewall structure too. I will post logs later, after I do some more tests.

See my comment in #10225 for an explanation and a workaround of this bug.

should be working with current versions. reopen if problems still occur

