Modify

Opened 8 years ago

Closed 2 years ago

#6754 closed defect (fixed)

Can crash wan port (eth1) with ping flood (Atheros AG71xx)

Reported by: monte <peabody.bb4wtmzci8qj@…> Owned by: juhosg
Priority: normal Milestone: Barrier Breaker 14.07
Component: kernel Version: Trunk
Keywords: Cc:

Description

KAMIKAZE (bleeding edge, r19863) - 2/27/2010
system type : Atheros AR7240 rev 2
machine : D-Link DIR-600 rev. A1
(Fry's FR-54RTR)

I've seen this behavior since I got this machine.
I've been working around it by setting MTU to 1400
and the machine is much more stable at 1400.

I originally mentioned it in a forum post, but figured
it needed a proper bug.
https://forum.openwrt.org/viewtopic.php?pid=103396

I issue the following ping command from my Mac on the LAN side
to a computer on the WAN side and crash eth1 in a few seconds.
sudo ping -f -s 1472 192.168.1.2

WARNING: at net/sched/sch_generic.c:261 0x801fb328()
NETDEV WATCHDOG: eth1 (ag71xx): transmit queue 0 timed out
Modules linked in: nf_nat_tftp nf_conntrack_tftp nf_nat_irc nf_conntrack_irc nf_nat_ftp nf_conntrack_ftp ipt_MASQUERADE iptable_nat nf_nat xt_NOTRACK iptable_raw xt_state nf_conntrack_ipv4 nf_defrag_ipv4 nf_conntrack pppoe pppo
Call Trace:[<80068374>] 0x80068374
[<80068374>] 0x80068374
[<8007cd08>] 0x8007cd08
[<801fb328>] 0x801fb328
[<8007cd88>] 0x8007cd88
[<801e6a70>] 0x801e6a70
[<801fb328>] 0x801fb328
eth1: link down
eth1: link up (100Mbps/Full duplex)
eth1: tx timeout
eth1: link down
eth1: link up (100Mbps/Full duplex)
eth1: tx timeout
eth1: link down
eth1: link up (100Mbps/Full duplex)

Attachments (0)

Change History (41)

comment:1 Changed 8 years ago by monte <peabody.bb4wtmzci8qj@…>

Adding the error again with wiki formatting.

 WARNING: at net/sched/sch_generic.c:261 0x801fb328()
 NETDEV WATCHDOG: eth1 (ag71xx): transmit queue 0 timed out
 Modules linked in: nf_nat_tftp nf_conntrack_tftp nf_nat_irc nf_conntrack_irc 
 nf_nat_ftp nf_conntrack_ftp ipt_MASQUERADE iptable_nat nf_nat xt_NOTRACK iptable_raw 
 xt_state nf_conntrack_ipv4 nf_defrag_ipv4 nf_conntrack pppoe pppo
 Call Trace:[<80068374>] 0x80068374
 [<80068374>] 0x80068374
 [<8007cd08>] 0x8007cd08
 [<801fb328>] 0x801fb328
 [<8007cd88>] 0x8007cd88
 [<801e6a70>] 0x801e6a70
 [<801fb328>] 0x801fb328
 eth1: link down
 eth1: link up (100Mbps/Full duplex)
 eth1: tx timeout
 eth1: link down
 eth1: link up (100Mbps/Full duplex)
 eth1: tx timeout
 eth1: link down
 eth1: link up (100Mbps/Full duplex)

comment:2 Changed 8 years ago by monte <peabody.bb4wtmzci8qj@…>

I don't know if this is related or not, but eth0 shows up in logread as a
gigabit interface, but doesn't negotiate to higher than 100Mbps. (I believe
the hardware is only a 10/100 interface.)

 eth0: Atheros AG71xx at 0xba000000, irq 5
 eth0: link up (1000Mbps/Full duplex)
 eth0: link down
 eth0: link up (1000Mbps/Full duplex)
 device eth0 entered promiscuous mode
 br-lan: port 1(eth0) entering forwarding state

My Mac shows it's 100 Mbit.

   en0: flags=8863<UP,BROADCAST,SMART,RUNNING,SIMPLEX,MULTICAST> mtu 1500
        inet6 fe80::216:cbff:fe89:xxxx%en0 prefixlen 64 scopeid 0x4 
        inet 192.168.5.10 netmask 0xffffff00 broadcast 192.168.5.255
        ether 00:16:cb:89:1b:d7 
        media: autoselect (100baseTX <full-duplex,flow-control>) status: active
        supported media: autoselect 10baseT/UTP <half-duplex> 
10baseT/UTP <full-duplex> 10baseT/UTP <full-duplex,hw-loopback> 
10baseT/UTP <full-duplex,flow-control> 100baseTX <half-duplex> 
100baseTX <full-duplex> 100baseTX <full-duplex,hw-loopback> 
100baseTX <full-duplex,flow-control> 1000baseT <full-duplex> 
1000baseT <full-duplex,hw-loopback> 1000baseT <full-duplex,flow-control> none

comment:3 Changed 8 years ago by monte <peabody.bb4wtmzci8qj@…>

I was not able to crash the router by just doing the ping flood from the router.

hping3 --flood 192.168.1.2 -1 -d 1472

comment:4 Changed 8 years ago by thepeople

  • Owner changed from developers to juhosg
  • Status changed from new to assigned

comment:5 Changed 8 years ago by juhosg

  • Status changed from assigned to accepted

comment:6 Changed 8 years ago by fercerpav@…

Here the same is reproducible in about a minute, but i couldn't trigger it with CONFIG_AG71XX_DEBUG=y (on WR741ND).

comment:7 Changed 8 years ago by anonymous

as a temporary fix I edited ag71xx_main.c so:

static void ag71xx_tx_timeout(struct net_device *dev)
{/*
	struct ag71xx *ag = netdev_priv(dev);

	if (netif_msg_tx_err(ag))
		printk(KERN_DEBUG "%s: tx timeout\n", ag->dev->name);

	schedule_work(&ag->restart_work);*/
}

it seems to work but i'm not sure how safe it is

comment:8 Changed 8 years ago by dbogatev@…

This problem can be fixed by setting MTU to 1400 on wan port:

vi /etc/config/network

in the "config interface wan" add 	
option mtu	1400

I've found this solution from http://wiki.openwrt.org/toh/tp-link/tl-wr941nd and it worked perfectly for me.

comment:9 Changed 8 years ago by ray@…

I've also experienced this on an RB750. I can trigger it every time using latest SVN (r21791) just by doing "ls -lR /".

Note that it only happens on eth1 (WAN) - I'm unable to reproduce it on the LAN ports.

The MTU 1400 workaround seems to work OK here too, but it's clearly not the correct "fix".

comment:10 Changed 8 years ago by Pieter "Fate" Hollants <pieter@…>

The same bug happens on the TL-WA901ND, which, being an access point, has just a single Ethernet port, but otherwise is pretty similar to the TL-WR741ND (support in trunk coming soon with a patch of mine). In my case, the bug is pretty good reproducible by running iperf benchmarks from the access point to a connected laptop:

Server side: ifconfig eth0 192.168.1.100 up; iperf -s
Client side: iperf -c192.168.1.100 -i1 -t900 -l256K

Sometimes the client needs -P2 or a reboot of the AP to trigger the bug. I spent a hell of a time trying to figure out what is happening exactly, even had to modify the ag71xx driver to use a separate debugfs "file" to get useful debugging output since printk() threw apart the timings and the involved UART messed up the output just after the timeout condition.

From what I can tell at some moment the TX engine simply hangs and some descriptors don't get the empty flag set, so eventually the queue is full and nothing happens until tx_timeout(). In my case, the scheduled restart function can't even get the interface back up properly because ag71xx doesn't reset on open() (separate patch for this coming along in a separate ticket shortly).

I added debugging statements to watch the contents of the normal and the DMA registers under normal conditions and just before the TX timeout, but no results here. I noticed that AG71XX_REG_TX_STATUS returns the number of TXed packets under some conditions, however adding to ag71xx_tx_packets()

dma_sent = (ag71xx_rr(ag, AG71XX_REG_TX_STATUS) >> TX_STATUS_PKTCNT_SHIFT) & 0xfff;
if (dma_sent != sent)

/* debug warning msg */

got me warnings TOO often (TX_STATUS_PKTCNT_SHIFT = 16), so if seems even this register can't be used to at least detect TX hangs at a moment earlier than tx_timeout().

I also did extensive comparisons (in form of hand-written notes) of ag71xx with the ag7240 driver, that DD-WRT picked up and originated from Atheros and seems to be a slighty modified ag7100, upon which ag71xx is based. However the differences do not seem to explain why the ag7240 binary driver in the TP-Link firmware does NOT crash _and_ yields higher performance :(

comment:11 Changed 8 years ago by Pieter "Fate" Hollants <pieter@…>

Some more information on the situation before and after TX lockup. I added resp. removed some debug statements in ag71xx_main.c. This is before TX lockups:

Jan  1 00:00:56 OpenWrt user.debug kernel: eth1: tx queue full
Jan  1 00:00:56 OpenWrt user.debug kernel: eth1: dma_tx_ctrl=00000001, dma_tx_desc=01e491c0, dma_tx_status=c0080003
Jan  1 00:00:56 OpenWrt user.debug kernel: eth1: dma_rx_ctrl=00000001, dma_rx_desc=01e4da40, dma_rx_status=00050001
Jan  1 00:00:56 OpenWrt user.debug kernel: eth1: 10 packets sent out
Jan  1 00:00:56 OpenWrt user.debug kernel: eth1: raw intr=00000001 TXPS 

dma_tx_desc changes between subsequent "tx queue full events", of course, the other two registers are the same.

And this is after the TX lockup:

Jan  1 00:01:08 OpenWrt user.warn kernel: ------------[ cut here ]------------
Jan  1 00:01:08 OpenWrt user.warn kernel: WARNING: at net/sched/sch_generic.c:261 dev_watchdog+0x164/0x260()
Jan  1 00:01:08 OpenWrt user.info kernel: NETDEV WATCHDOG: eth1 (ag71xx): transmit queue 0 timed out
Jan  1 00:01:08 OpenWrt user.warn kernel: Modules linked in: nf_nat_tftp nf_conntrack_tftp nf_nat_irc nf_conntrack_irc nf_nat_ftp nf_conntrack_ftp ipt_MASQUERADE iptable_nat nfo
Jan  1 00:01:08 OpenWrt user.warn kernel: Call Trace:
Jan  1 00:01:08 OpenWrt user.warn kernel: [<800682b4>] dump_stack+0x8/0x34
Jan  1 00:01:08 OpenWrt user.warn kernel: [<8007ce14>] warn_slowpath_common+0x70/0xb0
Jan  1 00:01:08 OpenWrt user.warn kernel: [<8007ce94>] warn_slowpath_fmt+0x24/0x30
Jan  1 00:01:08 OpenWrt user.warn kernel: [<801e1364>] dev_watchdog+0x164/0x260
Jan  1 00:01:08 OpenWrt user.warn kernel: [<800874f8>] run_timer_softirq+0x14c/0x1d8
Jan  1 00:01:08 OpenWrt user.warn kernel: [<80082634>] __do_softirq+0xb0/0x148
Jan  1 00:01:08 OpenWrt user.debug kernel: ag71xx_restart_work_func()
Jan  1 00:01:08 OpenWrt user.debug kernel: eth1: dma_tx_ctrl=00000001, dma_tx_desc=01e49320, dma_tx_status=00000002
Jan  1 00:01:08 OpenWrt user.debug kernel: eth1: dma_rx_ctrl=00000001, dma_rx_desc=01e4db80, dma_rx_status=00000000
Jan  1 00:01:08 OpenWrt user.debug kernel: eth1: mac_cfg1=0000003f, mac_cfg2=00007215, ipg=40605060, hdx=00a1f037, mfl=00000600
Jan  1 00:01:08 OpenWrt user.debug kernel: eth1: mac_ifctl=00000000, mac_addr1=aa3c1fcd, mac_addr2=23000000
Jan  1 00:01:08 OpenWrt user.debug kernel: eth1: fifo_cfg0=001f1f00, fifo_cfg1=01ff0000, fifo_cfg2=000003ff
Jan  1 00:01:08 OpenWrt user.debug kernel: eth1: fifo_cfg3=008001ff, fifo_cfg4=0000ffff, fifo_cfg5=000fefef
Jan  1 00:01:08 OpenWrt user.info kernel: eth1: link down

comment:12 Changed 8 years ago by Jonathan Bennett <JBScience87@…>

Just experienced this bug on eth0, *not* the WAN port. Using the TP-Link TL-WR841N
and trunk r21987 I turned off DHCP on the router, plugged the switch side into my
home network, and connected to the wireless. It was up for about a day. I started
doing some testing with iperf and it crashed right away. I also have MTU 1400 enabled
on all ports in my /etc/config/network. Strange bug.

)
Jul  1 22:20:31 OpenWrt user.warn kernel: ------------[ cut here ]------------
Jul  1 22:20:31 OpenWrt user.warn kernel: WARNING: at net/sched/sch_generic.c:261 0x801dcd14()
Jul  1 22:20:31 OpenWrt user.info kernel: NETDEV WATCHDOG: eth0 (ag71xx): transmit queue 0 timed out
Jul  1 22:20:31 OpenWrt user.warn kernel: Modules linked in: nf_nat_tftp nf_conntrack_tftp nf_nat_irc nf_conntrack_irc nf_nat_ftp nf_conntrack_ftp ipt_MASQUERADE iptable_nat nf_nat xt_NOTRACK iptable_raw xt_state nf_conntrack_ipv4 nf_defrag_ipv4 nf_conntrack ipt_REJECT
Jul  1 22:20:31 OpenWrt user.warn kernel: Call Trace:[<800682cc>] 0x800682cc
Jul  1 22:20:31 OpenWrt user.warn kernel: [<800682cc>] 0x800682cc
Jul  1 22:20:31 OpenWrt user.warn kernel: [<8007ca80>] 0x8007ca80
Jul  1 22:20:31 OpenWrt user.warn kernel: [<801dcd14>] 0x801dcd14
Jul  1 22:20:31 OpenWrt user.warn kernel: [<8007cb00>] 0x8007cb00
Jul  1 22:20:31 OpenWrt user.warn kernel: [<801c8b64>] 0x801c8b64
Jul  1 22:20:31 OpenWrt user.warn kernel: [<801dcd14>] 0x801dcd14
Jul  1 22:21:21 OpenWrt user.debug kernel: eth0: tx timeout
Jul  1 22:21:21 OpenWrt user.info kernel: eth0: link down
Jul  1 22:21:21 OpenWrt user.info kernel: eth0: link up (1000Mbps/Full duplex)

comment:13 Changed 8 years ago by nbd

  • Resolution set to fixed
  • Status changed from accepted to closed

hangs should be much less severe and recoverable with the change from r22055,
marking as fixed, since the rest is just the effect of a hw issue.

comment:14 Changed 8 years ago by Jonathan Bennett <jbscience87@…>

This is true, but note that for it to be usable, one still needs "mtu 1400" in the config. The interface takes several seconds to restart, and without setting the mtu, it's down more than it's up under a heavy load.

comment:15 Changed 8 years ago by anonymous

Indeed, this is not really usable.
If I get the time, I'll try to port the workaround from the Atheros drivers to the OpenWrt driver. Basically they use a timer that periodically walks the tx descriptors list and clears them manually under certain conditions, that are quite simple in their ag7100 driver and more complicated in their ag7240 driver.

comment:16 Changed 8 years ago by Jonathan Bennett <jbscience87@…>

What is actually causing the problem? Is it a hardware flaw, or is it a bug in the OpenWRT driver for this hardware? Is it worth looking to see how tp-link does it in their firmware? (http://www.tp-link.com/support/gpl.asp)

comment:17 Changed 8 years ago by Pieter "Fate" Hollants

It is a hardware flaw, for which Atheros does a workaround in their driver. Please see the comments above for more information.

comment:18 Changed 8 years ago by acoul

I can confirm this issue on recent trunk. Ping flooding the remote WAN IP kills the router out of memory exhaustion. This is the case for ar231x & brcm47xx (16Mb RAM) devices without any running programs - daemons, only madwifi driver and static routing. This was tested on both jffs2/lzma & squasfs images.

root@neo-south-wisp@ozonet:~# free
              total         used         free       shared      buffers
  Mem:        13636        11408         2228            0          588
Swap:            0            0            0
Total:        13636        11408         2228
root@neo-south-wisp@ozonet:~# free
              total         used         free       shared      buffers
  Mem:        13636        12136         1500            0          588
Swap:            0            0            0
Total:        13636        12136         1500
root@neo-south-wisp@ozonet:~# free
              total         used         free       shared      buffers
  Mem:        13636        12508         1128            0          588
Swap:            0            0            0
Total:        13636        12508         1128
root@neo-south-wisp@ozonet:~# free
              total         used         free       shared      buffers
  Mem:        13636        12888          748            0          588
Swap:            0            0            0
Total:        13636        12888          748
root@neo-south-wisp@ozonet:~# free



              total         used         free       shared      buffers
  Mem:        13636        12624         1012            0          288
Swap:            0            0            0
Total:        13636        12624         1012
root@neo-south-wisp@ozonet:~#
root@neo-south-wisp@ozonet:~#
root@neo-south-wisp@ozonet:~#
root@neo-south-wisp@ozonet:~#
root@neo-south-wisp@ozonet:~#
root@neo-south-wisp@ozonet:~# free
              total         used         free       shared      buffers
  Mem:        13636        12712          924            0          288
Swap:            0            0            0
Total:        13636        12712          924
root@neo-south-wisp@ozonet:~# dmesg

mapped:288 shmem:5 pagetables:26 bounce:0
Normal free:192kB min:508kB low:632kB high:760kB active_anon:1576kB inactive_anon:1036kB active_file:1576kB inactive_file:2464kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:16256kB mlocked:0kB dirty:0kB writeback:0kB mapped:1152kB shmem:20kB slab_reclaimable:440kB slab_unreclaimable:4908kB kernel_stack:176kB pagetables:104kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 0
Normal: 0*4kB 0*8kB 0*16kB 0*32kB 1*64kB 1*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 192kB
1027 total pagecache pages
4096 pages RAM
687 pages reserved
849 pages shared
2831 pages non-shared
free: page allocation failure. order:0, mode:0x20
Call Trace:[<80048a5c>] 0x80048a5c
[<80048a5c>] 0x80048a5c
[<8009b140>] 0x8009b140
[<80b40728>] 0x80b40728
[<800ba6f8>] 0x800ba6f8
[<801821a8>] 0x801821a8
[<800bab5c>] 0x800bab5c
[<801ab5a8>] 0x801ab5a8
[<801aba9c>] 0x801aba9c
[<8019f32c>] 0x8019f32c
[<80bf2708>] 0x80bf2708
[<80bf2658>] 0x80bf2658
[<80064ffc>] 0x80064ffc
[<80065820>] 0x80065820
[<8008ed10>] 0x8008ed10
[<80065940>] 0x80065940
[<80065aa8>] 0x80065aa8
[<80041844>] 0x80041844
[<800969a0>] 0x800969a0
[<80145488>] 0x80145488
[<8014548c>] 0x8014548c
[<800e468c>] 0x800e468c
[<80041844>] 0x80041844
[<80144e0c>] 0x80144e0c
[<800eab40>] 0x800eab40
[<8010f77c>] 0x8010f77c
[<8010f5ac>] 0x8010f5ac
[<8010b240>] 0x8010b240
[<8010bb38>] 0x8010bb38
[<8010cbf0>] 0x8010cbf0
[<801b2794>] 0x801b2794
[<8008ebe0>] 0x8008ebe0
[<8009d3e4>] 0x8009d3e4
[<80065aa8>] 0x80065aa8
[<8009d458>] 0x8009d458
[<80095ea8>] 0x80095ea8
[<800ab344>] 0x800ab344
[<800ac5e8>] 0x800ac5e8
[<800acaf8>] 0x800acaf8
[<8008ebe0>] 0x8008ebe0
[<80053fd8>] 0x80053fd8
[<80065aa8>] 0x80065aa8
[<80041844>] 0x80041844
[<80041844>] 0x80041844
[<800550c4>] 0x800550c4
[<80041820>] 0x80041820
[<800b0d9c>] 0x800b0d9c
[<800550c4>] 0x800550c4
[<800f9508>] 0x800f9508
[<80045454>] 0x80045454
[<800fa0d4>] 0x800fa0d4
[<800c2074>] 0x800c2074
[<800c24c4>] 0x800c24c4
[<800c3954>] 0x800c3954
[<800c7cb8>] 0x800c7cb8
[<8004f908>] 0x8004f908
[<80043610>] 0x80043610

Mem-Info:
Normal per-cpu:
CPU    0: hi:    0, btch:   1 usd:   0
active_anon:394 inactive_anon:259 isolated_anon:0
active_file:394 inactive_file:616 isolated_file:0
unevictable:0 dirty:0 writeback:0 unstable:0
free:48 slab_reclaimable:110 slab_unreclaimable:1227
mapped:288 shmem:5 pagetables:26 bounce:0
Normal free:192kB min:508kB low:632kB high:760kB active_anon:1576kB inactive_anon:1036kB active_file:1576kB inactive_file:2464kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:16256kB mlocked:0kB dirty:0kB writeback:0kB mapped:1152kB shmem:20kB slab_reclaimable:440kB slab_unreclaimable:4908kB kernel_stack:176kB pagetables:104kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 0
Normal: 0*4kB 0*8kB 0*16kB 0*32kB 1*64kB 1*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 192kB
1027 total pagecache pages
4096 pages RAM
687 pages reserved
849 pages shared
2831 pages non-shared
free: page allocation failure. order:0, mode:0x20
Call Trace:[<80048a5c>] 0x80048a5c
[<80048a5c>] 0x80048a5c
[<8009b140>] 0x8009b140
[<80b40728>] 0x80b40728

comment:19 follow-up: Changed 8 years ago by ray@…

@acoul - that appears to be a different bug, and should presumably be reported as such.

The AR7130 specific bug mentioned here causes a network interface hang only - the rest of the system remains running.

comment:20 in reply to: ↑ 19 Changed 8 years ago by acoul

Replying to ray@…:

@acoul - that appears to be a different bug, and should presumably be reported as such.

The AR7130 specific bug mentioned here causes a network interface hang only - the rest of the system remains running.

you are correct, I just noticed that after posting this bug report, and there is no ... UNDO button ;-)

comment:21 Changed 8 years ago by Pieter "Fate" Hollants <pieter@…>

At least for me it seems that the TX lockups disappear since nbd's FIFO init val change in r22303. Could all of you please try with this or a newer version?

comment:22 Changed 8 years ago by fercerpav@…

Felix, we all know you work hard and creatively, and i guess everybody appreciates that a lot.

But this commit of yours in r22303 is as cryptic as it can get. Please, let us others learn too and explain from where and how you got those values and why you decided they're worth using.

Big thanks in advance.
Paul Fertser.

comment:23 Changed 8 years ago by Pieter "Fate" Hollants <pieter@…>

Since nbd answered me this just yesterday:

The change simply supplies default initialization values for the FIFO registers on ar7240, unless the specific model brings its own. The values were taken from Atheros' ag7240 driver (eg. function ag7240_hw_setup), which is used eg. by TP-Link itself. Felix got it from Atheros directly, but you can also get it from the DD-WRT SVN. As for the meaning of the values, you can decode them at least partially using the defines in ag7240.h.

comment:24 Changed 8 years ago by Jonathan Bennett <jbscience87@…>

I'm still getting the lockups on my tl-wr841n, running r22321. If MTU is left at 1500, they happen every time I run iperf. With MTU at 1400, it is fairly rare, but I have observed it at least once, IIRC.

comment:25 Changed 8 years ago by anonymous

Just to report I had the same problem on my TP-Link 941nd. see here ticket #7656 and #7614

Pitou

comment:26 Changed 7 years ago by jason.harris@…

Just to check if I have the same thing here:

I have an AR7240 based system that has a Tx lockup problem.
The symptom is the DMA thinks it has Tx under-run but the tx descriptor
register is pointing to a full descriptor. The whole ring fills up with
Tx packets and the Tx DMA make no progress.

Is this the same symptom as the HW bug that is dealt with by the check_for_dma_status()
code in the Atheros SDK?

mac0 tx descriptors (total 255) . empty * full m more
ctrl 00000001 desc 013c2120 status 00000002

0: * a13c2000 pkt 03a733a2 sz 060 > 013c2020
1: * a13c2020 pkt 03959182 sz 060 > 013c2040
2: * a13c2040 pkt 03ab0fe2 sz 060 > 013c2060
3: .m a13c2060 pkt 039eb222 sz 000 > 013c2080 < current
4: .m a13c2080 pkt 03988342 sz 000 > 013c20a0
5: .m a13c20a0 pkt 03a96d22 sz 000 > 013c20c0
6: .m a13c20c0 pkt 03a6f0a2 sz 000 > 013c20e0
7: .m a13c20e0 pkt 03a83742 sz 000 > 013c2100
8: .m a13c2100 pkt 03a6e842 sz 000 > 013c2120
9: * a13c2120 pkt 03ac2ca2 sz 060 > 013c2140 < dirty

10: * a13c2140 pkt 0396f9a2 sz 060 > 013c2160
11: * a13c2160 pkt 039f05e2 sz 060 > 013c2180
12: * a13c2180 pkt 03996562 sz 060 > 013c21a0
13: * a13c21a0 pkt 03a31462 sz 060 > 013c21c0
14: * a13c21c0 pkt 0393a362 sz 060 > 013c21e0
15: * a13c21e0 pkt 03ab0782 sz 060 > 013c2200

etc....

251: * a13c3f60 pkt 0391df22 sz 060 > 013c3f80
252: * a13c3f80 pkt 03a93ae2 sz 060 > 013c3fa0
253: * a13c3fa0 pkt 039513e2 sz 060 > 013c3fc0
254: * a13c3fc0 pkt 0395cc22 sz 060 > 013c2000

comment:27 Changed 7 years ago by anonymous

  • Resolution fixed deleted
  • Status changed from closed to reopened

The same problem still happens with eth1 on Planex MZK-W04NU and Linksys WRT160NL here. Using backfire r24240. Setting values for MTU does not help.

comment:28 Changed 7 years ago by nbd

  • Resolution set to fixed
  • Status changed from reopened to closed

MZK-W04NU and WRT160NL are based on AR913x, whereas this bug report talks about AR7240.
The issues discussed here were mostly AR7240 specific.
Please rather than just writing that 'the same problem still happens', make a new ticket with a proper bug description and feel free to cc me on that new ticket.

comment:29 Changed 6 years ago by nikolay@…

  • Resolution fixed deleted
  • Status changed from closed to reopened

tl-wr741nd preflashed with Backfire (10.03.1-RC5, r27608)


WARNING: at net/sched/sch_generic.c:261 0x80201f68()
NETDEV WATCHDOG: eth1 (ag71xx): transmit queue 0 timed out
Modules linked in: ohci_hcd ath_pci ath_hal(P) nf_nat_tftp nf_conntrack_tftp nf_nat_irc nf_conntrack_irc nf_nat_ftp nf_conntrack_ftp ipt_MASQUERADE iptable_nat nf_nat xt_NOTRACK iptable_raw xt_state nf_conntrack_ipv4 nf_defrag_ipv4 nf_conntrack ehci_hcd pppoe pppox ipt_REJECT xt_TCPMSS ipt_LOG xt_comment xt_multiport xt_mac xt_limit iptable_mangle iptable_filter ip_tables xt_tcpudp x_tables ppp_async ppp_generic slhc ath9k ath9k_common ath9k_hw ath mac80211 usbcore nls_base crc_ccitt cfg80211 compat_firmware_class compat arc4 aes_generic deflate ecb cbc leds_gpio button_hotplug gpio_buttons input_polldev input_core
Call Trace:[<80069200>] 0x80069200
[<80069200>] 0x80069200
[<8007ddfc>] 0x8007ddfc
[<80201f68>] 0x80201f68
[<8007de7c>] 0x8007de7c
[<801ed590>] 0x801ed590
[<80201f68>] 0x80201f68
[<802691fc>] 0x802691fc
[<80267b74>] 0x80267b74
[<80269224>] 0x80269224
[<80201e04>] 0x80201e04
[<80088738>] 0x80088738
[<80071834>] 0x80071834
[<80083868>] 0x80083868
[<80083948>] 0x80083948
[<8006082c>] 0x8006082c
[<80060a00>] 0x80060a00
[<8006c15c>] 0x8006c15c
[<8006d604>] 0x8006d604
[<80060a20>] 0x80060a20
[<802dca54>] 0x802dca54
[<802dc3a8>] 0x802dc3a8

---[ end trace c43019188ec7fc4f ]---
eth1: tx timeout
eth1: link up (100Mbps/Half duplex)
eth1: link up (100Mbps/Full duplex)
eth1: link up (100Mbps/Half duplex)
eth1: link up (100Mbps/Full duplex)

comment:30 Changed 6 years ago by nbd

  • Resolution set to fixed
  • Status changed from reopened to closed

should be fixed in r27974

comment:31 Changed 6 years ago by anonymous

  • Resolution fixed deleted
  • Status changed from closed to reopened

The same problem still happens with eth1 on tp-link TL-WR741ND V:1.3. Using backfire r29592. Setting values for MTU does not help.

eth0: link down
br-lan: port 1(eth0) entering disabled state
eth0: link up (1000Mbps/Full duplex)
br-lan: port 1(eth0) entering forwarding state
------------[ cut here ]------------
WARNING: at net/sched/sch_generic.c:261 0x80202644()
NETDEV WATCHDOG: eth0 (ag71xx): transmit queue 0 timed out
Modules linked in: ohci_hcd nf_nat_tftp nf_conntrack_tftp nf_nat_irc nf_conntrack_irc nf_nat_ftp nf_conntrack_ftp ipt_MASQUERADE iptable_nat nf_nat xt_NOTRACK iptable_raw xt_state nf_conntrack_ipv4 nf_defrag_ipv4 nf_conntrack ehci_hcd pppoe pppox ipt_REJECT xt_TCPMSS ipt_LOG xt_comment xt_multiport xt_mac xt_limit iptable_mangle iptable_filter ip_tables xt_tcpudp x_tables ppp_async ppp_generic slhc ath9k ath9k_common ath9k_hw ath mac80211 usbcore nls_base crc_ccitt cfg80211 compat arc4 aes_generic deflate ecb cbc leds_gpio button_hotplug gpio_buttons input_polldev input_core
Call Trace:[<80069378>] 0x80069378
[<80069378>] 0x80069378
[<8007e138>] 0x8007e138
[<80202644>] 0x80202644
[<8007e1b8>] 0x8007e1b8
[<801edc6c>] 0x801edc6c
[<80202644>] 0x80202644
[<80d7190c>] 0x80d7190c
[<80d65850>] 0x80d65850
[<802658d8>] 0x802658d8
[<80d6566c>] 0x80d6566c
[<802024e0>] 0x802024e0
[<80088a74>] 0x80088a74
[<80071b70>] 0x80071b70
[<80083ba4>] 0x80083ba4
[<80083c84>] 0x80083c84
[<8006082c>] 0x8006082c
[<80060a00>] 0x80060a00
[<8006d940>] 0x8006d940
[<80060a20>] 0x80060a20
[<802dea54>] 0x802dea54
[<802de3a8>] 0x802de3a8

---[ end trace 1936a2ecf2a2773e ]---
eth0: tx timeout
eth0: link down

comment:32 Changed 5 years ago by anonymous

I had similar issue recently, I am using a gentoo laptop and started to notice the flood of link down/up message after I upgrated to kernel 3.5.1, I tried 3.5.2 yesterday and eth1 went dead almost every 30 minutes. However, After downgrated to 3.5.0, though I still see the link up and down message from time to time, the connection is stable for about a day now. Hope this will help you debug the problem. I connect to my route (Buffalo WHR-HP-G300N with OpenWrt Attitude Adjustment r33206) through wifi, the wireless chip is a Intel 6300 AGN.

comment:33 Changed 5 years ago by brnt

FWIW, I'm seeing the same issue on a RT3052-based board--doesn't appear to be specific to Atheros chips/drivers.

comment:34 Changed 4 years ago by anonymous

I have the problem with tplink 841N. I installed openwrt on it with this weeks firmware. Going to try the MTU trick.

comment:35 Changed 4 years ago by anonymous

And... the 1400 MTU trick on the LAN didn't work for me. The LAN stopped working in a very short time. tplink 841N v5.

comment:36 Changed 4 years ago by jow

  • Milestone changed from Attitude Adjustment 12.09 to Barrier Breaker 14.07

Milestone Attitude Adjustment 12.09 deleted

comment:37 Changed 3 years ago by anonymous

same bug on D-link DIR-615 E4 (BB)

comment:38 Changed 3 years ago by anonymous

same bug on D-link DIR-615 E4 (BB)

comment:39 Changed 3 years ago by weeds

I am using the Barrier Breaker latest code, compile for TL-WR841N v8, and still I got below message which caused us can not access the Internet, this happened on 10M Hub which TL-WR841N v8 wan port plugged in. With some downloading something it just happen, I don't think we should overlook it to the next milestone to fix.

[ 1015.010000] ------------[ cut here ]------------
[ 1015.010000] WARNING: at net/sched/sch_generic.c:255 dev_watchdog+0x1e8/0x26c()
[ 1015.020000] NETDEV WATCHDOG: eth0 (ag71xx): transmit queue 0 timed out
[ 1015.020000] Modules linked in: ath9k ath9k_common pppoe ppp_async iptable_nat ath9k_hw ath pppox ppp_generic nf_nat_ipv4 nf_conntrack_ipv4 mac80211 ipt_MASQUERADE cfg80211 xt_time xt_tcpudp xt_tcpmss xt_string xt_statistic xt_state xt_recent xt_quota xt_pkttype xt_owner xt_nat xt_multiport xt_mark xt_mac xt_limit xt_length xt_hl xt_helper xt_ecn xt_dscp xt_conntrack xt_connmark xt_connlimit xt_connbytes xt_comment xt_addrtype xt_TCPMSS xt_REDIRECT xt_NETMAP xt_LOG xt_HL xt_DSCP xt_CT xt_CLASSIFY ts_kmp ts_fsm ts_bm slhc nf_nat_irc nf_nat_ftp nf_nat nf_defrag_ipv4 nf_conntrack_irc nf_conntrack_ftp iptable_raw iptable_mangle iptable_filter ipt_REJECT ipt_ECN ip_tables crc_ccitt compat sch_teql sch_tbf sch_sfq sch_red sch_prio sch_htb sch_gred sch_dsmark sch_codel em_text em_nbyte em_meta em_cmp cls_basic act_police act_ipt act_connmark act_skbedit act_mirred em_u32 cls_u32 cls_tcindex cls_flow cls_route cls_fw sch_hfsc sch_ingress ip6t_REJECT ip6table_raw ip6table_mangle ip6table_filter ip6_tables x_tables nf_conntrack_ipv6 nf_conntrack nf_defrag_ipv6 ifb ipv6 arc4 crypto_blkcipher gpio_button_hotplug
[ 1015.130000] CPU: 0 PID: 0 Comm: swapper Not tainted 3.10.49 #2
[ 1015.130000] Stack : 00000000 00000000 00000000 00000000 80372eba 00000032 80313498 0000004d
[ 1015.130000] 802c4f58 8031321b 00000000 80372664 80313498 0000004d 8039383c 00000001
[ 1015.130000] 00000004 80079040 00000003 80076ac0 802ed7fc 0000004d 802c6818 8030dc74
[ 1015.130000] 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[ 1015.130000] 00000000 00000000 00000000 00000000 00000000 00000000 00000000 8030dc00
[ 1015.130000] ...
[ 1015.170000] Call Trace:
[ 1015.170000] [<8006e294>] show_stack+0x48/0x70
[ 1015.180000] [<80076bbc>] warn_slowpath_common+0x78/0xa8
[ 1015.180000] [<80076c18>] warn_slowpath_fmt+0x2c/0x38
[ 1015.190000] [<8021a240>] dev_watchdog+0x1e8/0x26c
[ 1015.190000] [<800828d8>] call_timer_fn.isra.38+0x24/0x84
[ 1015.200000] [<80082ab4>] run_timer_softirq+0x17c/0x1bc
[ 1015.200000] [<8007d974>] do_softirq+0xd0/0x1bc
[ 1015.210000] [<8007db00>] do_softirq+0x48/0x68
[ 1015.210000] [<8007dd34>] irq_exit+0x54/0x70
[ 1015.220000] [<80060830>] ret_from_irq+0x0/0x4
[ 1015.220000] [<8006b8e0>] r4k_wait_irqoff+0x18/0x1c
[ 1015.230000] [<8009efe8>] cpu_startup_entry+0xa4/0x104
[ 1015.230000] [<80329910>] start_kernel+0x38c/0x3a4
[ 1015.240000]
[ 1015.240000] ---[ end trace 9a4953f252a32df0 ]---
[ 1015.240000] eth0: tx timeout
[ 1015.250000] eth0: link down
[ 1015.260000] eth0: link up (10Mbps/Half duplex)
[ 1021.020000] eth0: tx timeout
[ 1021.020000] eth0: link down
[ 1021.270000] eth0: link up (10Mbps/Half duplex)
[ 1027.020000] eth0: tx timeout
[ 1027.020000] eth0: link down
[ 1027.270000] eth0: link up (10Mbps/Half duplex)
[ 1033.020000] eth0: tx timeout
[ 1033.020000] eth0: link down
[ 1033.270000] eth0: link up (10Mbps/Half duplex)
[ 1039.020000] eth0: tx timeout
[ 1039.020000] eth0: link down
[ 1039.270000] eth0: link up (10Mbps/Half duplex)
[ 1045.020000] eth0: tx timeout
[ 1045.020000] eth0: link down
[ 1045.270000] eth0: link up (10Mbps/Half duplex)
[ 1051.020000] eth0: tx timeout
[ 1051.020000] eth0: link down
[ 1051.270000] eth0: link up (10Mbps/Half duplex)
[ 1057.020000] eth0: tx timeout
[ 1057.020000] eth0: link down
[ 1057.270000] eth0: link up (10Mbps/Half duplex)
[ 1063.020000] eth0: tx timeout
[ 1063.020000] eth0: link down
[ 1063.270000] eth0: link up (10Mbps/Half duplex)
[ 1069.020000] eth0: tx timeout
[ 1069.020000] eth0: link down
[ 1069.270000] eth0: link up (10Mbps/Half duplex)
[ 1075.020000] eth0: tx timeout
[ 1075.020000] eth0: link down
[ 1075.270000] eth0: link up (10Mbps/Half duplex)
[ 1081.020000] eth0: tx timeout
[ 1081.020000] eth0: link down
[ 1081.270000] eth0: link up (10Mbps/Half duplex)
[ 1087.020000] eth0: tx timeout
[ 1087.020000] eth0: link down
[ 1087.270000] eth0: link up (10Mbps/Half duplex)
[ 1093.020000] eth0: tx timeout
[ 1093.020000] eth0: link down
[ 1093.270000] eth0: link up (10Mbps/Half duplex)
[ 1099.020000] eth0: tx timeout
[ 1099.020000] eth0: link down
[ 1099.270000] eth0: link up (10Mbps/Half duplex)
[ 1105.020000] eth0: tx timeout
[ 1105.020000] eth0: link down
[ 1105.270000] eth0: link up (10Mbps/Half duplex)
[ 1111.020000] eth0: tx timeout
[ 1111.020000] eth0: link down
[ 1111.270000] eth0: link up (10Mbps/Half duplex)
[ 1117.020000] eth0: tx timeout
[ 1117.020000] eth0: link down
[ 1117.270000] eth0: link up (10Mbps/Half duplex)
[ 1123.020000] eth0: tx timeout
[ 1123.020000] eth0: link down
[ 1123.270000] eth0: link up (10Mbps/Half duplex)
[ 1129.020000] eth0: tx timeout
[ 1129.020000] eth0: link down
[ 1129.270000] eth0: link up (10Mbps/Half duplex)
[ 1135.020000] eth0: tx timeout
[ 1135.020000] eth0: link down
[ 1135.270000] eth0: link up (10Mbps/Half duplex)
[ 1141.020000] eth0: tx timeout
[ 1141.020000] eth0: link down
[ 1141.270000] eth0: link up (10Mbps/Half duplex)
[ 1147.020000] eth0: tx timeout
[ 1147.020000] eth0: link down
[ 1147.270000] eth0: link up (10Mbps/Half duplex)
[ 1153.020000] eth0: tx timeout
[ 1153.020000] eth0: link down
[ 1153.270000] eth0: link up (10Mbps/Half duplex)
[ 1159.020000] eth0: tx timeout
[ 1159.020000] eth0: link down
[ 1159.270000] eth0: link up (10Mbps/Half duplex)
[ 1165.020000] eth0: tx timeout
[ 1165.020000] eth0: link down
[ 1165.270000] eth0: link up (10Mbps/Half duplex)
[ 1171.020000] eth0: tx timeout
[ 1171.020000] eth0: link down
[ 1171.270000] eth0: link up (10Mbps/Half duplex)
[ 1177.020000] eth0: tx timeout
[ 1177.020000] eth0: link down
[ 1177.270000] eth0: link up (10Mbps/Half duplex)
[ 1183.020000] eth0: tx timeout
[ 1183.020000] eth0: link down
[ 1183.270000] eth0: link up (10Mbps/Half duplex)
[ 1189.020000] eth0: tx timeout
[ 1189.020000] eth0: link down
[ 1189.270000] eth0: link up (10Mbps/Half duplex)
[ 1195.020000] eth0: tx timeout
[ 1195.020000] eth0: link down
[ 1195.270000] eth0: link up (10Mbps/Half duplex)
[ 1201.020000] eth0: tx timeout
[ 1201.020000] eth0: link down
.

comment:40 Changed 3 years ago by poiuty

TL-WR841N, Barrier Breaker 14.07

[954587.010000] ------------[ cut here ]------------
[954587.010000] WARNING: at net/sched/sch_generic.c:255 dev_watchdog+0x1e8/0x26c()
[954587.020000] NETDEV WATCHDOG: eth0 (ag71xx): transmit queue 0 timed out
[954587.020000] Modules linked in: ath9k ath9k_common pppoe ppp_async iptable_nat ath9k_hw ath pppox ppp_generic nf_nat_ipv4 nf_conntrack_ipv4 mac80211 ipt_MASQUERADE cfg80211 xt_time xt_tcpudp xt_state xt_nat xt_multiport xt_mark xt_mac xt_limit xt_id xt_conntrack xt_comment xt_TCPMSS xt_REDIRECT xt_LOG xt_CT slhc nf_nat_irc nf_nat_ftp nf_nat nf_defrag_ipv4 nf_conntrack_irc nf_conntrack_ftp iptable_raw iptable_mangle iptable_filter ipt_REJECT ip_tables crc_ccitt compat ip6t_REJECT ip6table_raw ip6table_mangle ip6table_filter ip6_tables x_tables nf_conntrack_ipv6 nf_conntrack nf_defrag_ipv6 ipv6 arc4 crypto_blkcipher gpio_button_hotplug
[954587.080000] CPU: 0 PID: 0 Comm: swapper Not tainted 3.10.49 #3
[954587.090000] Stack : 00000000 00000000 00000000 00000000 803bce76 00000032 8033f578 0000005d
[954587.090000] 802f5b44 8033f9a3 00000000 803b3a00 8033f578 0000005d 803bc7c8 00000001
[954587.090000] 00000004 80290d44 00000003 801f39b0 8030e8c8 0000005d 802f71d4 8032fc74
[954587.090000] 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[954587.090000] 00000000 00000000 00000000 00000000 00000000 00000000 00000000 8032fc00
[954587.090000] ...
[954587.130000] Call Trace:
[954587.130000] [<80235a38>] show_stack+0x48/0x70
[954587.130000] [<802a06ec>] warn_slowpath_common+0x78/0xa8
[954587.140000] [<802a0748>] warn_slowpath_fmt+0x2c/0x38
[954587.140000] [<80109cf0>] dev_watchdog+0x1e8/0x26c
[954587.150000] [<800e9698>] call_timer_fn.isra.38+0x24/0x84
[954587.150000] [<8021e4b0>] run_timer_softirq+0x17c/0x1bc
[954587.160000] [<8008f698>] do_softirq+0xd0/0x1bc
[954587.160000] [<8011f8bc>] do_softirq+0x48/0x68
[954587.170000] [<8017ee00>] irq_exit+0x54/0x70
[954587.170000] [<80060830>] ret_from_irq+0x0/0x4
[954587.180000] [<8020401c>] r4k_wait_irqoff+0x18/0x1c
[954587.180000] [<800f96c4>] cpu_startup_entry+0xa4/0x104
[954587.190000] [<8034c918>] start_kernel+0x394/0x3ac
[954587.190000]
[954587.200000] ---[ end trace f1c97ba515a67b12 ]---
[954587.200000] eth0: tx timeout
[954587.210000] eth0: link down
[954588.840000] eth0: link up (100Mbps/Full duplex)
[954594.020000] eth0: tx timeout
[954594.020000] eth0: link down
[954594.840000] eth0: link up (100Mbps/Full duplex)
[954600.020000] eth0: tx timeout
[954600.020000] eth0: link down
[954600.840000] eth0: link up (100Mbps/Full duplex)
[954606.020000] eth0: tx timeout
[954606.020000] eth0: link down
[954606.840000] eth0: link up (100Mbps/Full duplex)
[954612.020000] eth0: tx timeout
[954612.020000] eth0: link down
[954612.840000] eth0: link up (100Mbps/Full duplex)

Last edited 3 years ago by poiuty (previous) (diff)

comment:41 Changed 2 years ago by nbd

  • Resolution set to fixed
  • Status changed from reopened to closed

fixed in r47892, r47895

Add Comment

Modify Ticket

Action
as closed .
The resolution will be deleted. Next status will be 'reopened'.
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.