Modify

Opened 3 years ago

Last modified 3 years ago

#17584 accepted defect

fw3 dies with segfault or sigabrt when enabling port forwards in conjuction with pppoe and IPv6

Reported by: pdffs Owned by: jow
Priority: response-needed Milestone: Barrier Breaker 14.07
Component: packages Version: Barrier Breaker 14.07
Keywords: firewall fw3 Cc:

Description

With a wan connection configured to use PPPoE, and IPv6 enabled on the interface, as soon as fw3 hits a redirect from wan to lan, it dies with either segfault or sigabrt.

The result is no firewall rules at all on boot, and after boot, either inability to load new rules (and it fails silently if using LuCI), or potentially loss of all existing rules if they get flushed. As it stands, you can have either IPv6, or port forwards, but not both.

I can't for the life of me work out how to make firewall3 build with debug symbols and skip stripping, so I'm afraid I have no idea where it's dying, but it's 100% reproducible for me.

Attachments (3)

ipv6.log (4.4 KB) - added by pdffs 3 years ago.
valgrind with ipv6 enabled
noipv6.log (17.4 KB) - added by pdffs 3 years ago.
valgrind without ipv6 enabled
firewall.conf (8.1 KB) - added by JoeBar 3 years ago.
My /etc/config/firewall

Download all attachments as: .zip

Change History (22)

comment:1 Changed 3 years ago by jow

  • Owner changed from developers to jow
  • Status changed from new to accepted

You should see one or more core dump files in /tmp/, copy those to your buildroot and execute the following command in the toplevel directory where you usually execute make:

./scripts/remote-gdb /path/to/core.file build_dir/target-*/firewall-*/firewall3

In the resulting gdb prompt enter:

bt full

to obtain a stack trace.

There is no need to rebuild firewall in any specific way, it is also not required to run an unstripped version on your router. Just make sure that the version producing the crashes on your router is the same that is built in your tree. So best is to run "make package/firewall/{clean,compile}" then scp thwe resulting .ipk to your router and opkg install it.

Please post the stacktrace info here once you managed to obtain it.

comment:2 Changed 3 years ago by pdffs

Sorry for the delay, apparently I wasn't getting notifications by default from trac, here's the stacktrace:

Core was generated by `fw3 reload'.
Program terminated with signal 6, Aborted.
#0  0x00007f3beb5fd4fd in __GI_raise (sig=sig@entry=6) at libpthread/nptl/sysdeps/unix/sysv/linux/raise.c:64
64        return INLINE_SYSCALL (tgkill, 3, pid, selftid, sig);
(gdb) set pagination off
(gdb) thread apply all bt full

Thread 1 (LWP 6707):
#0  0x00007f3beb5fd4fd in __GI_raise (sig=sig@entry=6) at libpthread/nptl/sysdeps/unix/sysv/linux/raise.c:64
        __res = 0
        pid = 6707
        selftid = 6707
#1  0x00007f3beb5f9164 in __GI_abort () at libc/stdlib/abort.c:89
        sigs = {__val = {32}}
#2  0x00007f3beb5f8a11 in __malloc_consolidate (av=av@entry=0x7f3beb816fc0 <__malloc_state>) at libc/stdlib/malloc-standard/free.c:225
        fb = <optimised out>
        maxfb = <optimised out>
        p = <optimised out>
        nextp = 0x13d3ed0
        unsorted_bin = 0x7f3beb817030 <__malloc_state+112>
        first_unsorted = <optimised out>
        nextchunk = <optimised out>
        size = <optimised out>
        nextsize = <optimised out>
        prevsize = <optimised out>
        nextinuse = <optimised out>
        bck = <optimised out>
        fwd = <optimised out>
#3  0x00007f3beb5f7d67 in malloc (bytes=bytes@entry=760) at libc/stdlib/malloc-standard/malloc.c:908
        __infunc_pthread_cleanup_buffer = {__routine = 0x7f3beb5fd9bf <pthread_mutex_unlock>, __arg = 0x7f3beb8123e0 <__malloc_lock>, __canceltype = 0, __prev = 0x13dc888}
        av = 0x7f3beb816fc0 <__malloc_state>
        nb = 768
        idx = 38
        bin = <optimised out>
        fb = <optimised out>
        victim = <optimised out>
        size = <optimised out>
        victim_index = <optimised out>
        remainder = <optimised out>
        remainder_size = <optimised out>
        block = <optimised out>
        bit = <optimised out>
        map = <optimised out>
        fwd = <optimised out>
        bck = <optimised out>
        retval = <optimised out>
#4  0x00007f3beb5f8509 in calloc (n_elements=n_elements@entry=1, elem_size=elem_size@entry=760) at libc/stdlib/malloc-standard/calloc.c:39
        __infunc_pthread_cleanup_buffer = {__routine = 0x7f3beb5fd9bf <pthread_mutex_unlock>, __arg = 0x7f3beb8123e0 <__malloc_lock>, __canceltype = 20790144, __prev = 0x13cf970}
        __infunc_need_locking = 1
        p = <optimised out>
        clearsize = <optimised out>
        nclears = <optimised out>
        size = 760
        d = <optimised out>
        mem = <optimised out>
#5  0x000000000040956b in fw3_load_redirects (state=state@entry=0x13cc010, p=0x13cd1f0) at /tank/incoming/tmp/openwrt/14.07-x86_64/build_dir/target-x86_64_uClibc-0.9.33.2/firewall-2014-07-19/redirects.c:229
        s = 0x13cfd80
        e = 0x13cfd80
        redir = <optimised out>
        valid = <optimised out>
#6  0x0000000000404746 in build_state (runtime=runtime@entry=false) at /tank/incoming/tmp/openwrt/14.07-x86_64/build_dir/target-x86_64_uClibc-0.9.33.2/firewall-2014-07-19/main.c:107
        state = 0x13cc010
        p = 0x13cd1f0
        sf = <optimised out>
        b = {head = 0x13cc1b0, grow = 0x7f3bec659078 <blob_buffer_grow>, buflen = 256, buf = 0x13cc1b0}
#7  0x0000000000403f12 in main (argc=2, argv=0x7fffb88a2268) at /tank/incoming/tmp/openwrt/14.07-x86_64/build_dir/target-x86_64_uClibc-0.9.33.2/firewall-2014-07-19/main.c:523
        ch = <optimised out>
        rv = 1
        family = FW3_FAMILY_ANY
        defs = 0x0

comment:3 Changed 3 years ago by pdffs

Possibly relevant is that this is a dual-stack connection, so there is both an IPv4, and an IPv6 address on the interface when the problem is evident.

comment:4 Changed 3 years ago by jow

Looks like a heap corruption, dunno where it comes from. Will need to check fw3 on x86_64 under valgrind to see if I can find something.

comment:5 Changed 3 years ago by pdffs

I'll run it up under valgrind and post results, any particular valgrind features I should enable?

comment:6 Changed 3 years ago by jow

No, a simple memtest run should already flag the most serious issues.

Changed 3 years ago by pdffs

valgrind with ipv6 enabled

Changed 3 years ago by pdffs

valgrind without ipv6 enabled

comment:7 Changed 3 years ago by pdffs

Hrm, I'm not sure my valgrind output when the failure scenario is in place (ipv6.log) is very helpful without debug symbols for.... something. I've also included valgrind log for a successful run.

comment:8 Changed 3 years ago by pdffs

So, I was looking at the other code, because this consistently occurs in redirects, even if I change the order of loading for the various entites (rules/redirects/snats/etc), so I assume the corruption is occurring inside the uci_foreach_element loop.

If I adapt fw3_load_redirects to use the same pattern as fw3_load_rules, ie - blob_for_each_attr/alloc/append, then list_for_each_entry_safe to modify the list members, it works without exploding, until it hits forwards, which uses the same logic as the current redirects code.

Should I continue down this path and update the other fw_load_* functions, or am I just masking the actual problem somehow?

comment:9 Changed 3 years ago by JoeBar

The problem seems to be architecture specific. The same rules, which break with fw3 on x86_64 run on ar71xx.

Running Trunk r42419...

comment:10 Changed 3 years ago by pdffs

Well, that removes any urgency for the BB release cycle, sadly, since there's no official x86_64 target. I had hoped to provide one for the userbase though, since a number of people have expressed interest.

If it's arch-specific, I imagine it's a word size issue, or some such, somewhere but I can't see it right now.

comment:11 Changed 3 years ago by JoeBar

Yesterday i checked the issue on x86. No problems there, so it seems to be x86_64 only or maybe a 64bit problem...

comment:12 Changed 3 years ago by pdffs

If I turn on -Wconversion, there are a lot of them. The bit setting macros in utils.h look a little dangerous, and there are a bunch of long/int mismatches, and signedness conversions. Lots of strtol being assigned to ints, ints where size_t/ssize_t should be used, etc.

I'd love to fix this, but I'm afraid my C isn't up to snuff where I'd feel comfortable doing a good job (ie - that I wouldn't break more things than I would fix in the attempt).

comment:13 Changed 3 years ago by jow

  • Priority changed from high to response-needed

Please test whether r42604 fixes your issue.

comment:14 Changed 3 years ago by pdffs

Initial test, by backporting r42604 to BB, gives me the same stacktrace. I can try building from trunk and testing in a VM, but the only system I know I can repro on is doing actual work for me right now.

@JoeBar, can you share your networking/firewall configs that you're using to reproduce?

Changed 3 years ago by JoeBar

My /etc/config/firewall

comment:15 Changed 3 years ago by JoeBar

I just compiled and installed r42608...
Enabling one of the DNAT rules in my firewall config leeds to fw3 aborting. If I disable all DNAT rules like in the attached firewall config and put the raw DNAT iptables rules to firewall.user, everything works as expected...

comment:16 Changed 3 years ago by jow

Hm, I cannot reproduce this issue here anymore. Can you provide me the ifstatus of all involved interfaces?

comment:18 Changed 3 years ago by JoeBar

I just compiled and installed r42610...

It works now.

Thanks a lot...

comment:19 Changed 3 years ago by pdffs

This bug appears to be resolved now, thanks muchly for the effort! I am concerned that there may be other bugs lurking for 64-bit though, due to the conversion issues noted above?

Add Comment

Modify Ticket

Action
as accepted .
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.