Modify

Opened 10 years ago

Closed 10 years ago

#2569 closed defect (fixed)

r9337: eth0: tx dma ring full

Reported by: nabcore Owned by: developers
Priority: high Milestone: Kamikaze 8.09 RC1
Component: kernel Version:
Keywords: ar7 Cc:

Description

Switched on my main PC to find that the Ethernet connection to my DG834G v2 was not working. Upon logging into the DG834Gv2 via serial, I found the following messages in dmesg:

[141608.450000] printk: 1448206 messages suppressed.
[141608.450000] eth0: tx dma ring full
[141613.450000] printk: 1426184 messages suppressed.
[141613.450000] eth0: tx dma ring full
[141618.450000] printk: 1448283 messages suppressed.
[141618.450000] eth0: tx dma ring full
[141623.450000] printk: 1448208 messages suppressed.
[141623.450000] eth0: tx dma ring full
[141628.450000] printk: 1447225 messages suppressed.
[141628.450000] eth0: tx dma ring full

I had been downing a torrent earier on in the day.

Attachments (2)

01-cpmac-reset-race-fixes.patch (4.5 KB) - added by oliver@… 10 years ago.
(1/2) patch to fix race between device reset and start_xmit
02-cpmac-rx-ring-use-eoq.patch (5.9 KB) - added by oliver@… 10 years ago.
(2/2) patch to reimplement rx ring with EOQ markers to avoid reset storms

Download all attachments as: .zip

Change History (22)

comment:1 Changed 10 years ago by nabcore

That should be "downloading a torrent earlier in the day"

comment:2 Changed 10 years ago by nabcore

Still seeing this on r9426

comment:3 Changed 10 years ago by nabcore

On average, this bug seems to be hanging my router every 3 days or so.

comment:4 Changed 10 years ago by openwrt@…

There's a workaround that cures the hang; if you down the interface and bring it back up, it recovers.

comment:5 Changed 10 years ago by oliver@…

Also seen on a D-Link DSL-502T running trunk r9583. Seems to happen under 200-300kB/s tx load:

Nov 25 09:38:01 OpenWrt user.warn kernel: eth0: tx dma ring full
Nov 25 09:38:01 OpenWrt user.warn kernel: eth0: tx dma ring full
Nov 25 09:38:01 OpenWrt user.warn kernel: eth0: tx dma ring full
Nov 25 09:38:01 OpenWrt user.warn kernel: eth0: tx dma ring full
Nov 25 09:38:09 OpenWrt user.warn kernel: printk: 92 messages suppressed.
Nov 25 09:38:09 OpenWrt user.warn kernel: eth0: tx dma ring full
Nov 25 09:38:14 OpenWrt user.warn kernel: eth0: tx dma ring full
Nov 25 09:38:19 OpenWrt user.warn kernel: eth0: tx dma ring full
Nov 25 09:38:24 OpenWrt user.warn kernel: printk: 8 messages suppressed.
Nov 25 09:38:24 OpenWrt user.warn kernel: eth0: tx dma ring full
Nov 25 09:38:29 OpenWrt user.warn kernel: printk: 11 messages suppressed.
Nov 25 09:38:29 OpenWrt user.warn kernel: eth0: tx dma ring full
Nov 25 09:38:40 OpenWrt user.warn kernel: eth0: tx dma ring full
Nov 25 09:38:47 OpenWrt user.warn kernel: eth0: tx dma ring full
Nov 25 09:38:49 OpenWrt user.warn kernel: eth0: tx dma ring full
Nov 25 09:40:01 OpenWrt cron.notice crond[607]: USER root pid 743 cmd /bin/savelog.sh
Nov 25 09:41:56 OpenWrt user.err kernel: eth0: hw error, resetting...
Nov 25 09:41:56 OpenWrt user.warn kernel: eth0: tx dma ring full
Nov 25 09:41:56 OpenWrt user.warn kernel: eth0: tx dma ring full
Nov 25 09:41:56 OpenWrt user.warn kernel: eth0: tx dma ring full
Nov 25 09:41:56 OpenWrt user.warn kernel: eth0: tx dma ring full
Nov 25 09:41:56 OpenWrt user.warn kernel: eth0: tx dma ring full
Nov 25 09:41:56 OpenWrt user.warn kernel: eth0: tx dma ring full
Nov 25 09:41:56 OpenWrt user.warn kernel: eth0: tx dma ring full
Nov 25 09:41:56 OpenWrt user.warn kernel: eth0: tx dma ring full
Nov 25 09:41:56 OpenWrt user.warn kernel: eth0: tx dma ring full
Nov 25 09:43:38 OpenWrt user.warn kernel: printk: 215 messages suppressed.
Nov 25 09:43:38 OpenWrt user.err kernel: eth0: hw error, resetting...
Nov 25 09:43:38 OpenWrt user.warn kernel: eth0: tx dma ring full
Nov 25 09:43:38 OpenWrt user.warn kernel: eth0: tx dma ring full
Nov 25 09:43:38 OpenWrt user.warn kernel: eth0: tx dma ring full
Nov 25 09:43:38 OpenWrt user.warn kernel: eth0: tx dma ring full
Nov 25 09:43:38 OpenWrt user.warn kernel: eth0: tx dma ring full
Nov 25 09:43:38 OpenWrt user.warn kernel: eth0: tx dma ring full
Nov 25 09:43:38 OpenWrt user.warn kernel: eth0: tx dma ring full
Nov 25 09:43:38 OpenWrt user.warn kernel: eth0: tx dma ring full
Nov 25 09:43:38 OpenWrt user.warn kernel: eth0: tx dma ring full
Nov 25 09:43:43 OpenWrt user.warn kernel: printk: 1319138 messages suppressed.
Nov 25 09:43:43 OpenWrt user.warn kernel: eth0: tx dma ring full
Nov 25 09:43:48 OpenWrt user.warn kernel: printk: 1418344 messages suppressed.
Nov 25 09:43:48 OpenWrt user.warn kernel: eth0: tx dma ring full
Nov 25 09:43:53 OpenWrt user.warn kernel: printk: 1419392 messages suppressed.

As you can see there are singleton "tx dma ring full" and "hw error, resetting.." for a while, but the interface is still operational at that point. These messages also show up under lower load. Then at 09:43:43 the transmit side gets totally wedged and I have to powercycle to recover.

The receive path seems OK when this happens (e.g. I see inbound DHCP requests logged) but no traffic makes it out of the interface.

comment:6 Changed 10 years ago by oliver@…

I threw some debugging into the driver, here's the descriptor state where it's got stuck:

<4>eth0: tx dma ring full (queue 0)
<4>eth0: queue 0 is stopped
<7>eth0: desc[b4a36000]: 00000000 14872804 000005ea e00005ea 945bcb20 00000000 14a36000 14872804

It does not change over time (same skb pointer etc in the descriptor on subsequent messages). Seems like the descriptor is all set up correctly for xmit, but the hardware isn't actually completing the send.

I'm wondering if it's a race condition somewhere in that reset code .. I always seem to see a reset shortly before the interface gets stuck.

comment:7 Changed 10 years ago by oliver@…

Simple way to reproduce the problem: "ping -f" the ethernet side from an external machine. The driver gets stuck reliably within 2-3 minutes on my D-Link.

More debug output from a modified driver:

<3>eth0: hw error: interrupt status=0x00040000 MAC status=0x00002000
<7>eth0: desc[b48ec000]: 00000000 14b4c602 00000062 d0000062 00000000 00000000 148ec000 14b4c602
<7>eth0: cpmac_hw_stop()
<6>Completed IRQ half of reset
<7>eth0: desc[b48ec000]: 00000000 14b4c602 00000062 d0000062 00000000 00000000 148ec000 14b4c602
<4>printk: 8 messages suppressed.
<4>eth0: tx dma ring full (queue 0, stopped)
<7>eth0: desc[b48ec000]: 00000000 14b4c802 00000062 e0000062 94898e60 00000000 148ec000 14b4c802
<4>eth0: cpmac_hw_error() entry
<7>eth0: cpmac_hw_start()
<4>eth0: cpmac_hw_error() exit
<4>printk: 1375240 messages suppressed.
<4>eth0: tx dma ring full (queue 0, stopped)
<7>eth0: desc[b48ec000]: 00000000 1486a86a 000004a6 e00004a6 94b18ef8 00000000 148ec000 1486a86a

My changes log descriptor 0 after the "hw error", "tx dma ring full" and "completed IRQ half of reset" messages. The "completed IRQ half of reset" is logged at the end of the IRQ error-reset handler code. I also log if schedule_work(&reset_work) returns false, but that never appears to happen here.

Hypothesis (take with a large grain of salt since I'm not familiar with either the generic net layer or the hardware here ..)

  • net queue is stopped because TX is busy
  • interrupt causes reset_work/cpmac_hw_error() to be scheduled
  • cpmac_hw_error() starts
  • cpmac_hw_error() calls cpmac_clear_tx()
  • cpmac_clear_tx() clears descriptor 0
  • cpmac_clear_tx() calls netif_wake_subqueue()
  • netif_wake_subqueue schedules a softirq to transmit more data
  • interrupt happens before cpmac_hw_start() can run
  • softirq runs at end of interrupt and calls cpmac_start_xmit with more data
  • cpmac_start_xmit fills descriptor 0 and sets TXPTR to start xmit
  • interrupt completes
  • cpmac_hw_start() runs
  • hardware gets reset, leaving descriptor 0 filled but no pending xmit
  • tx side wedges

Anyone with actual knowledge of the hardware want to comment on this? ;-) When I get some more time I'll see if I can add more debugging to test this case.

comment:8 Changed 10 years ago by oliver@…

Got it:

<3>eth0: cpmac_start_xmit called when reset_hazard set
<3>eth0: reset_hazard=1 hardware_stopped=1 when_started=4294964205 when_stopped=4294964222 last_tx_start=4294964251 last_tx_int=4294964221
<4>Call Trace:
<4>[<9410840c>] dump_stack+0x8/0x34
<4>[<94234120>] cpmac_start_xmit+0xd0/0x36c
<4>[<94270594>] __qdisc_run+0x64/0x218
<4>[<9425f80c>] net_tx_action+0x230/0x24c
<4>[<941292bc>] __do_softirq+0x74/0xf8
<4>[<94129398>] do_softirq+0x58/0x8c
<4>[<94101b60>] ret_from_irq+0x0/0x4
<4>[<942336e8>] cpmac_hw_start+0xb8/0x228
<4>[<942347a8>] cpmac_hw_error+0x1b0/0x268
<4>[<94135bf4>] run_workqueue+0x14c/0x20c
<4>[<9413692c>] worker_thread+0xac/0x100
<4>[<9413ad88>] kthread+0x58/0xb0
<4>[<94103ce0>] kernel_thread_helper+0x10/0x18
<4>

If I'm reading the assembly right, cpmac_hw_start+0xb8 is the delay loop in ar7_device_reset() -> ar7_device_enable()

comment:9 Changed 10 years ago by oliver@…

Slightly related: netdev transmit watchdog never fires because it doesn't understand multiqueue and only checks if the device-wide queue is running or not.

comment:10 Changed 10 years ago by oliver@…

I patched the driver so that start_xmit would return BUSY when a reset was in progress, which fixed the stability but still had terrible performance under rx load due to continuous resets going on.

The resets seem to be a symptom of a larger problem. The MACSTATUS which generates it is error code 2 on rx channel 0. As far as I can tell through experimentation, error code 2 appears to mean "you set RXPTR to a descriptor that didn't have CPMAC_OWN set", which can happen when the rx ring fills under load. At that point the rx logic doesn't seem to recover on its own -- if I disable status interrupts, then the rx logic just hangs when it overruns like that. The ring can get into a strange state where instead of a single continuous range of OWN-ed blocks, they are interleaved with !OWN-blocks. Things break horribly at that point. I spent some time trying to diagnose exactly what is going on there, but didn't get too far - the rx code likes to blindly stomp on RXPTR even with concurrent receives running, though, which seems like a good formula for trouble.

I have a patch that changes the rx ring to have hw_next=0 on the descriptor immediately before rx_head. When we receive data, I move that "stop point" forward so that the newly processed descriptors become available for rx. If the ring fills and the "stop point" is reached before we can free up more space, the hardware sets the EOQ bit on that descriptor after completing the receive, and stops receiving. I detect that later in cpmac_poll(), count it as a rx overrun, and restart the receiver. With those changes, the receive path seems stable even in the face of lots of rx overruns (e.g. I can run with a rx ring size of 4 under ping -f with no ill effects beyond a lot of reported overruns; notably, no status interrupts or resets happened).

I'll clean up the patch this weekend and post it here.

Changed 10 years ago by oliver@…

(1/2) patch to fix race between device reset and start_xmit

Changed 10 years ago by oliver@…

(2/2) patch to reimplement rx ring with EOQ markers to avoid reset storms

comment:11 Changed 10 years ago by oliver@…

  • patch 1/2 is just the reset fixes
  • patch 2/2 is the rx ring rewrite

2/2 still needs more testing, it is better than the unpatched version but will still hang under extreme load ("netperf -t UDP_STREAM ... -- -m 1000")

comment:12 Changed 10 years ago by nabcore

Oliver, you're doing some great work here, keep it up. I'm just compiling now with your patches and I'll give you some feedback on how it goes. Thanks again

comment:13 Changed 10 years ago by nabcore

As an aside/after thought, are there any clues to be gained about the hardware's behavior by examining the cpmac source Netgear has released? ftp://downloads.netgear.com/files/GPL/DG834v2_V3.01.31_UK.tar.bz2

and then look in:

DG834v2_V3.01.31_UK/linux-2.4.17_mvl21/drivers/net/avalanche_cpmac/

having followed the patching instructions in the README

comment:14 Changed 10 years ago by oliver@…

I looked at the dlink equivalent a while back and it suffered badly from the "hey, let's build a HAL!" syndrome.

Looking at it again .. they use a simple linked list of rx buffers/descriptors and tack new ones onto the end as they become free, splicing them in by just setting the hw_next value on the tail of the existing (active) rx list. In the rx interrupt it tests for (EOQ and hw_next != 0) after processing frames and restarts rx in that case. So it's basically doing the same as my patch 2/2 does (although not doing it across a preallocated ring, instead it juggles buffers back and forth across the HAL/OS interface the whole time)

comment:15 Changed 10 years ago by nabcore

Hello, I have been running the two patches in this ticket against r9644 for a day or so now. They apply fine to trunk and it compiles. I have noticed some tx dma full messages:

root@Bunny:~# logread | grep kernel |tail
Dec  3 20:32:59 Bunny user.info kernel: [14523.540000] device ppp0 entered promiscuous mode
Dec  3 20:33:01 Bunny user.warn kernel: [14525.130000] eth0: tx dma ring full
Dec  3 20:33:02 Bunny user.info kernel: [14526.070000] device ppp0 left promiscuous mode
Dec  3 20:34:57 Bunny user.info kernel: [14640.670000] device ppp0 entered promiscuous mode
Dec  3 20:34:58 Bunny user.warn kernel: [14641.690000] eth0: tx dma ring full
Dec  3 20:34:58 Bunny user.warn kernel: [14641.730000] eth0: tx dma ring full
Dec  3 20:34:58 Bunny user.warn kernel: [14642.040000] eth0: tx dma ring full
Dec  3 20:34:58 Bunny user.info kernel: [14642.580000] device ppp0 left promiscuous mode
Dec  3 20:35:09 Bunny user.info kernel: [14653.360000] device ppp0 entered promiscuous mode
Dec  3 20:35:13 Bunny user.info kernel: [14656.860000] device ppp0 left promiscuous mode

but the router has not hung yet. I've not done any proper testing yet, say with netperf etc.

comment:16 Changed 10 years ago by nabcore

I think the method of submitting patches has changed a bit; http://forum.openwrt.org/viewtopic.php?pid=59925#p59925 , hence the patches in this ticket may get into trunk faster with this newer method.

comment:17 Changed 10 years ago by nbd

  • Resolution set to fixed
  • Status changed from new to closed

added in r9664, r9665

comment:18 Changed 10 years ago by matteo

  • Resolution fixed deleted
  • Status changed from closed to reopened

Changes lost in r10421, have to re-apply them

comment:19 Changed 10 years ago by matteo

Can't reapply them:

drivers/net/cpmac.c: In function 'cpmac_start_xmit':
drivers/net/cpmac.c:460: error: 'struct cpmac_priv' has no member named 'reset_pending'
drivers/net/cpmac.c:466: warning: comparison of distinct pointer types lacks a cast
drivers/net/cpmac.c: In function 'cpmac_hw_error':
drivers/net/cpmac.c:660: error: 'struct cpmac_priv' has no member named 'reset_pending'
drivers/net/cpmac.c: In function 'cpmac_check_status':
drivers/net/cpmac.c:694: error: 'struct cpmac_priv' has no member named 'reset_pending'
drivers/net/cpmac.c: In function 'cpmac_tx_timeout':
drivers/net/cpmac.c:745: error: 'struct cpmac_priv' has no member named 'reset_pending'
drivers/net/cpmac.c:749: error: 'struct cpmac_priv' has no member named 'reset_pending'
drivers/net/cpmac.c: In function 'cpmac_open':
drivers/net/cpmac.c:941: error: 'struct cpmac_priv' has no member named 'reset_pending'
drivers/net/cpmac.c: In function 'cpmac_probe':
drivers/net/cpmac.c:1017: warning: unused variable 'i'
drivers/net/cpmac.c:1152:2: warning: #warning FIXME: unhardcode gpio&reset bits
make[7]: *** [drivers/net/cpmac.o] Error 1

comment:20 Changed 10 years ago by matteo

  • Resolution set to fixed
  • Status changed from reopened to closed

Re-fixed in r10585

Add Comment

Modify Ticket

Action
as closed .
The resolution will be deleted. Next status will be 'reopened'.
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.