Modify

Opened 11 years ago

Closed 11 years ago

Last modified 10 years ago

#683 closed defect (wontfix)

IPv6 forwarding broken in 2.4.30

Reported by: adi@… Owned by: florian
Priority: normal Milestone: 0.9/rc6
Component: kernel Version:
Keywords: Cc:

Description

There is a weird bug in Linux-2.4.30, at least two different openwrt-whiterussian routers show the same behaviour:

ping6 from and to the subnet works, even wget, mplayer and so on are working, but iff you try to upload a large packet (say 1500 bytes), the packet won't reach its target, thus stalling the tcp connection (probably even udp, but I've not tested this).

This behaviour can be reproduced by initiating a ssh connection from the subnet to an outside target or by browsing an IPv6 enabled website with a decent browser (the request header is large enough to trigger this bug). On the other hand, mplayer's http request header is small enough to set up a tcp connection and stream a video/mp3 from the net (you might want to try it with <http://cluster.inf-ra.uni-jena.de/~adi/CC-Zwei-00.mp3>)

This could probably be a PMTUD issue, but I'm not sure about it.

Perhaps 2.4.32 fixes this problem, Linux-2.6 should be fine.

Attachments (2)

good.pcap (2.9 KB) - added by adi@… 10 years ago.
pcap file containing the working case
bad.pcap (2.9 KB) - added by adi@… 10 years ago.
pcap file containing the error case (screwed remote tunnel endpoint)

Download all attachments as: .zip

Change History (16)

comment:1 Changed 11 years ago by florian

  • Owner changed from developers to florian
  • Status changed from new to assigned

Confirmed

comment:2 Changed 11 years ago by adi@…

I've ugpraded the whiterussian-rc5 with a Linux 2.4.33 kernel (http://adi.loris.tv/bin/, if anyone is interested), but the problem is still present (no cure at all, exactly the same behaviour).

HTH

comment:3 Changed 11 years ago by florian

Actually, I have tested, and I cannot reproduce the bug but I have packet loss when using tunneled IPv6 connections to either the Internet or another subnet. I definitively think there are issues with the IPv6 stack under Linux-2.4.x.

comment:4 Changed 11 years ago by adi@…

Linux 2.4.33 does not have the defaultroute bug mentioned in #187.

I can snoop the interfaces with tcpdump and try to further investigate on this. Unfortunately, the buildroot-ng-2.6 does not boot (so neither buildroot-ng-2.4 does)(but this can be my fault), so I cannot try a Linux-2.6 kernel instead.

I'll create an artificial subnet (behind the WAN interface) and see if routing works between networks with identical MTU.

Perhaps it's even a good idea to try without bridgeing. Hopefully, this bug will turn out a configuration issue.

comment:5 Changed 11 years ago by anonymous

Ok, problem solved: This bug also applies to simple subnet routing. You'll have to flush any IPv6 address (especially the link local address) on the physical device (i.e. eth0) and assign it to the bridge instead (br0). Afterwards, everything works fine.

You may close the ticket, update the docs oder invent a little script dealing with this issue.

HTH

comment:6 Changed 11 years ago by adi@…

(forgot to set username in last comment)

comment:7 Changed 11 years ago by adi@…

Unfortunately, the above is only true for local subnet routing, so the issue remains.

comment:8 Changed 11 years ago by florian

2.4 kernel has a known bug : you need to have 2000::/3 as default route attached to the interface that links you to the 6 world, can you try with it if you did not had it set ?

Else, I don't think there are much IPv6 bugs, I have been using IPv6 with OpenWrt for almost a year now without problems.

comment:9 Changed 11 years ago by adi@…

Routing is clearly not the problem (see that I've mentioned the 2000::/3 thing above, ping6, telnet6 works and so on, but sending large chunks of data does not). And JFTR: 2000::/3 doesn't change anything to this behaviour.

traceroute6 works (more or less):

adi@chopin:~$ traceroute6 adi.thur.de
traceroute to ltw.loris.tv (2001:6f8:137e::1) from 2001:6f8:984:0:2c0:9fff:fe18:700f, 30 hops max, 16 byte packets
 1  wgate6-rl0.loris.tv (2001:6f8:984::1)  1.053 ms  0.585 ms  0.445 ms
 2  * * *
 3  ltw.loris.tv (2001:6f8:137e::1)  39.12 ms  39.576 ms  39.842 ms

But it gives a suspicious message if run with verbose:

adi@chopin:~$ traceroute6 -v adi.thur.de
traceroute to ltw.loris.tv (2001:6f8:137e::1) from 2001:6f8:984:0:2c0:9fff:fe18:700f, 30 hops max, 16 byte packets
 1  wgate6-rl0.loris.tv (2001:6f8:984::1)  0.701 ms  0.556 ms  0.464 ms
 2 
32 bytes from 2c0:9fff:fe18:700f:101:7:4038:dff8 to 2001:6f8:137e::1: icmp type 135 (OUT-OF-RANGE) code 0
 * * *
 3  * ltw.loris.tv (2001:6f8:137e::1)  42.382 ms  40.01 ms

And the most obvious test, first run on the working host (behind OpenBSD router):

adi@drcomp:~$ ping6 -c 1 -s 1000 adi.thur.de
PING adi.thur.de(ltw.loris.tv) 1000 data bytes
1008 bytes from ltw.loris.tv: icmp_seq=1 ttl=62 time=166 ms

--- adi.thur.de ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 166.707/166.707/166.707/0.000 ms

The same command on the failing host (behind openwrt):

adi@chopin:~$ ping6 -c 1 -s 100 adi.thur.de
PING adi.thur.de(ltw.loris.tv) 100 data bytes
108 bytes from ltw.loris.tv: icmp_seq=1 ttl=62 time=41.7 ms

--- adi.thur.de ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 41.714/41.714/41.714/0.000 ms

It works fine with 100 data bytes, but fails with 200:

adi@chopin:~$ ping6 -c 1 -s 200 adi.thur.de
PING adi.thur.de(ltw.loris.tv) 200 data bytes

--- adi.thur.de ping statistics ---
1 packets transmitted, 0 received, 100% packet loss, time 0ms

tcpdump on the openwrt shows that the icmp6-echo-request is really sent to the sixxs tunnel provider, so the request is discarded uplink (but why? It shouldn't.)

I can ping6 with 194 bytes, but with 195, no echo-reply is received. This is also true for the opposite direction:

adi@drcomp:~$ ping6 -c 1 -s 194 adiv6
PING adiv6(adiv6.loris.tv) 194 data bytes
202 bytes from adiv6.loris.tv: icmp_seq=1 ttl=61 time=118 ms

--- adiv6 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 118.283/118.283/118.283/0.000 ms

(working ping6 with 194 bytes to adiv6 (chopin))

adi@drcomp:~$ ping6 -c 1 -s 195 adiv6
PING adiv6(adiv6.loris.tv) 195 data bytes

--- adiv6 ping statistics ---
1 packets transmitted, 0 received, 100% packet loss, time 0ms

(failing ping6 with 195 bytes to chopin)

tcpdump shows the incoming echo-request on the sixxs device. It is received by chopin and the corresponding echo-reply is sent back (assured by tcpdump), but fails to reach drcomp (that's exactly the problem: sending large packages).

comment:10 Changed 11 years ago by adi@…

JFTR: Using AYIYA-tunnels, everything works fine. Though this does not fix the 6in4-problem, it is at least a workaround suitable for providers which support AYIYA (i.e. SixXS)

comment:11 Changed 11 years ago by florian

Actually, checking and rechecking on my 6in4 tunnels, I have packet loss, but that's all. If you tell me 2.4.33 is not fixing the trick, then we will have to wait for an upstream fix.

comment:12 Changed 11 years ago by florian

  • Resolution set to wontfix
  • Status changed from assigned to closed

Since 2.4.33 does not fix the bug, we will wait till an upstream fixes comes up.

comment:13 Changed 10 years ago by adi@…

News update on this issue: AYIYA tunnels don't work anymore, at least not my tunnels ;)

Though I now have the reason why packets larger than 195+8 bytes are not working: the IPv4 address of the remote tunnel endpoint gets screwed.

I've attached two pcap files to this ticket, one (good.pcap) shows the right tunnel endpoint 85.14.220.160 for packets less and up to 194(+8) bytes, the other (bad.pcap) shows a totally wrong destination address in the IPv4 header.

Even worse, this wrong address changes from time to time (let's say every two minutes). Obviously, not reaching the tunnel endpoint breaks the encapsulated traffic. ;)

Note that this only holds true for traffic being sent from the inside network via the router to the outside. Everything is fine with ping6ing the router itself or with large packets from the outside to the inside.

For now, I don't have any idea where things go wrong.

Changed 10 years ago by adi@…

pcap file containing the working case

Changed 10 years ago by adi@…

pcap file containing the error case (screwed remote tunnel endpoint)

comment:14 Changed 10 years ago by adi@…

Hello.

So the good news is: everything seems to be fine in Linux-2.6. I've managed to build kamikaze r11142 running 2.6.13.16 for my Buffalo WBR-B11 (removed the VLAN stuff from lib/network/config.sh, since this device has eth0 connected to LAN, eth1 connected to WAN and a b43-driven WLAN).

I'll still have to run more tests, but at least AYIYA tunnels (aiccu) are working.

Add Comment

Modify Ticket

Action
as closed .
The resolution will be deleted. Next status will be 'reopened'.
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.