Modify

Opened 21 months ago

Last modified 21 months ago

#22387 new defect

bgmac (Broadcom 47xx) "oversized packet" kernel panic

Reported by: cds@… Owned by: developers
Priority: normal Milestone:
Component: kernel Version: Chaos Calmer 15.05
Keywords: Cc:

Description

Hi! I've recently flashed Chaos Calmer onto a Netgear N300 (WNR3500L v1) using the official OpenWRT image from https://downloads.openwrt.org/chaos_calmer/15.05.1/brcm47xx/mips74k/openwrt-15.05.1-brcm47xx-mips74k-netgear-wnr3500l-v1-north-america-squashfs.chk and I have found a bit of a problem.

Under specific high-bandwidth loads, I get a kernel panic. The relevant part of the panic, IMO:

<3>[ 2069.830000] bgmac bcma0:2: Found oversized packet at slot 108, DMA issue!
<0>[ 2069.890000] skbuff: skb_over_panic: text:801b8c08 len:1741 put:1741 head:82b24740 data:82b24740 tail:0x82b24e0d end:0x82b24da0 de
v:<NULL>

Examining the relevant kernel code, I found this: http://lxr.free-electrons.com/source/drivers/net/ethernet/broadcom/bgmac.c#L458

It appears that the maths that determines whether a packet is oversized is a little suspect, but I can't see the bug immediately. On two different trials, I've noticed that the len: and put: in the panic are the same both times.

To reproduce, we have been using youtube-dl to stream ~8MiB/s of video through the router's WAN port. This reliably crashes the router after a few seconds of load.

I was not sure whether adjusting the MTU on the WAN port would mitigate this bug. It doesn't help; I lowered the MTU from 1500 to 1442, but it did not improve behavior.

I'm marking this as kernel component because the bug is very obviously somewhere in the kernel.

Thanks!
~ C.

Attachments (2)

crashlog (16.0 KB) - added by cds@… 21 months ago.
crashlog
crashlog.2 (16.0 KB) - added by cds@… 21 months ago.
crashlog

Download all attachments as: .zip

Change History (5)

Changed 21 months ago by cds@…

crashlog

comment:1 Changed 21 months ago by cds@…

Streaming video through Youtube's HTML player produces a similar crash:

<4>[ 1447.950000] sched: RT throttling activated
<0>[ 1448.010000] skbuff: skb_over_panic: text:801b8c08 len:1741 put:1741 head:8298a3a0 data:8298a3a0 tail:0x8298aa6d end:0x8298aa00 dev:<NULL>

No oversized-packet warning, but I get the feeling that we're off-by-one somewhere in bgmac.

Changed 21 months ago by cds@…

crashlog

comment:2 Changed 21 months ago by cds@…

I tried booting with the following kernel patch:

--- a/drivers/net/ethernet/broadcom/bgmac.c
+++ b/drivers/net/ethernet/broadcom/bgmac.c
@@ -467,19 +467,24 @@ static int bgmac_dma_rx_read(struct bgma
                                break;
                        }

+                       /* Omit CRC. */
+                       len -= ETH_FCS_LEN;
+
+                       /* Add in the alignment offsets. */
+                       len += BGMAC_RX_FRAME_OFFSET + BGMAC_RX_BUF_OFFSET;
+
+                       /* Check for packets that are too large for the skb. */
                        if (len > BGMAC_RX_ALLOC_SIZE) {
-                               bgmac_err(bgmac, "Found oversized packet at slot %d, DMA issue!\n",
-                                         ring->start);
+                               bgmac_err(bgmac, "Found oversized (%d) packet at slot %d, DMA issue!\n",
+                                         len, ring->start);
                                put_page(virt_to_head_page(buf));
                                break;
                        }
 
-                       /* Omit CRC. */
-                       len -= ETH_FCS_LEN;
-
+                       /* Build the skb with the len, which has been verified
+                        * above, and pull up to the alignment offset. */
                        skb = build_skb(buf, BGMAC_RX_ALLOC_SIZE);
-                       skb_put(skb, BGMAC_RX_FRAME_OFFSET +
-                               BGMAC_RX_BUF_OFFSET + len);
+                       skb_put(skb, len);
                        skb_pull(skb, BGMAC_RX_FRAME_OFFSET +
                                 BGMAC_RX_BUF_OFFSET);

This corrects the arithmetic in bgmac near the site of the panic bug. Unfortunately, it does *nothing* to improve the symptoms. Something else is going on here and I'm not sure what.

comment:3 Changed 21 months ago by cds@…

I found a combination of things that could be provoking the problem: The combination of Youtube, Comcast, OpenWRT, and Linux, means full IPv6 from the datacenter to the desktop. While exciting, it is definitely reflected in some strange misbehavior with other machines in the house on a different router, and I'm starting to think that the corner case is IPv6-specific.

Add Comment

Modify Ticket

Action
as new .
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.