Modify

Opened 2 years ago

Closed 22 months ago

#21857 closed defect (fixed)

OpenWRT "CHAOS CALMER (15.05, r46767)" + TP-LINK "TL-WR842ND V2" hangs on kernel booting sometimes..

Reported by: dexen@… Owned by: developers
Priority: high Milestone:
Component: kernel Version: Chaos Calmer 15.05
Keywords: Cc:

Description

Hello,

I faced the problem that sometime (approximately once per 10 times) OpenWRT cannot boot the kernel.
I used the UART hw and see that it freezes on the following place:

U-Boot 1.1.4 (Sep 22 2014 - 18:45:16)

U-boot AP123


DRAM:  32 MB
id read 0x100000ff
Flash:  8 MB
Using default environment

In:    serial
Out:   serial
Err:   serial
Net:   ag934x_enet_initialize...
wasp reset mask:c03300
WASP ----> S27 PHY
file: ag934x.c,line: 180==: set LAN&WAN SWAP. --debug by HouXB
GMAC: cfg1 0x5 cfg2 0x7114
eth0: ba:be:fa:ce:08:41
s27 reg init
athrs27_phy_setup ATHR_PHY_CONTROL 4: 0x1000
athrs27_phy_setup ATHR_PHY_SPEC_STAUS 4: 0x10
eth0 up
WASP ----> S27 PHY
file: ag934x.c,line: 180==: set LAN&WAN SWAP. --debug by HouXB
GMAC: cfg1 0xf cfg2 0x7214
eth1: ba:be:fa:ce:08:41
s27 reg init lan
ATHRS27: resetting s27
ATHRS27: s27 reset done
athrs27_phy_setup ATHR_PHY_CONTROL 0: 0x1000
athrs27_phy_setup ATHR_PHY_SPEC_STAUS 0: 0x10
athrs27_phy_setup ATHR_PHY_CONTROL 1: 0x1000
athrs27_phy_setup ATHR_PHY_SPEC_STAUS 1: 0x10
athrs27_phy_setup ATHR_PHY_CONTROL 2: 0x1000
athrs27_phy_setup ATHR_PHY_SPEC_STAUS 2: 0x10
athrs27_phy_setup ATHR_PHY_CONTROL 3: 0x1000
athrs27_phy_setup ATHR_PHY_SPEC_STAUS 3: 0x10
eth1 up
eth0, eth1
is_auto_upload_firmware=0
Autobooting in 1 seconds
## Booting image at 9f020000 ...
   Uncompressing Kernel Image ... OK

Starting kernel ...

[    0.000000] Linux version 3.18.20 (buildbot@builder1) (gcc version 4.8.3 (OpenWrt/Linaro GCC 4.8-2014.04 r46450) ) #1 Fri Sep 4 21:55:57 CEST 2015
[    0.000000] bootconsole [early0] enabled
[    0.000000] CPU0 revision is: 0001974c (MIPS 74Kc)
[    0.000000] SoC: Atheros AR9341 rev 3
[    0.000000] Determined physical RAM map:
[    0.000000]  memory: 02000000 @ 00000000 (usable)
[    0.000000] Initrd not found or empty - disabling initrd
[    0.000000] Zone ranges:
[    0.000000]   Normal   [mem 0x00000000-0x01ffffff]
[    0.000000] Movable zone start for each node
[    0.000000] Early memory node ranges
[    0.000000]   node   0: [mem 0x00000000-0x01ffffff]
[    0.000000] Initmem setup node 0 [mem 0x00000000-0x01ffffff]
[    0.000000] Primary instruction cache 64kB, VIPT, 4-way, linesize 32 bytes.
[    0.000000] Primary data cache 32kB, 4-way, VIPT, cache aliases, linesize 32 bytes
[    0.000000] Built 1 zonelists in Zone order, mobility grouping on.  Total pages: 8128
[    0.000000] Kernel command line:  board=TL-WR842N-v2  console=ttyS0,115200 rootfstype=squashfs,jffs2 noinitrd
[    0.000000] PID hash table entries: 128 (order: -3, 512 bytes)
[    0.000000] Dentry cache hash table entries: 4096 (order: 2, 16384 bytes)
[    0.000000] Inode-cache hash table entries: 2048 (order: 1, 8192 bytes)
[    0.000000] Writing ErrCtl register=00000000
[    0.000000] Readback ErrCtl register=00000000
[    0.000000] Memory: 28516K/32768K available (2621K kernel code, 129K rwdata, 344K rodata, 224K init, 194K bss, 4252K reserved)
[    0.000000] SLUB: HWalign=32, Order=0-3, MinObjects=0, CPUs=1, Nodes=1
[    0.000000] NR_IRQS:51
[    0.000000] Clocks: CPU:535.000MHz, DDR:400.000MHz, AHB:200.000MHz, Ref:25.000MHz
[    0.000000] Calibrating delay loop... 266.64 BogoMIPS (lpj=1333248)
[    0.080000] pid_max: default: 32768 minimum: 301
[    0.080000] Mount-cache hash table entries: 1024 (order: 0, 4096 bytes)
[    0.090000] Mountpoint-cache hash table entries: 1024 (order: 0, 4096 bytes)
[    0.100000] NET: Registered protocol family 16
[    0.100000] MIPS: machine is TP-LINK TL-WR842N/ND v2
[    0.560000] Switched to clocksource MIPS
[    0.560000] NET: Registered protocol family 2
[    0.570000] TCP established hash table entries: 1024 (order: 0, 4096 bytes)
[    0.570000] TCP bind hash table entries: 1024 (order: 0, 4096 bytes)
[    0.580000] TCP: Hash tables configured (established 1024 bind 1024)
[    0.580000] TCP: reno registered
[    0.590000] UDP hash table entries: 256 (order: 0, 4096 bytes)
[    0.590000] UDP-Lite hash table entries: 256 (order: 0, 4096 bytes)
[    0.600000] NET: Registered protocol family 1
[    0.600000] futex hash table entries: 256 (order: -1, 3072 bytes)
[    0.630000] squashfs: version 4.0 (2009/01/31) Phillip Lougher
[    0.630000] jffs2: version 2.2 (NAND) (SUMMARY) (LZMA) (RTIME) (CMODE_PRIORITY) (c) 2001-2006 Red Hat, Inc.
[    0.640000] msgmni has been set to 55
[    0.650000] io scheduler noop registered
[    0.650000] io scheduler deadline registered (default)
[    0.650000] Serial: 8250/16550 driver, 16 ports, IRQ sharing enabled
[    0.660000] console [ttyS0] disabled

Then nothing happens, I need to reset device to booting again...

Attachments (0)

Change History (16)

comment:1 Changed 2 years ago by anonymous

Mayve psu-problem? Try an other high quality psu and report if the problem is still there

comment:2 Changed 2 years ago by bittorf@…

maybe #21773 is related?

comment:3 Changed 2 years ago by anonymous

The problem appears only with OpenWRT 15.05 and I tested it with 14.07 - it's ok.
I also tested it with another power supply with the same result.

I have about 100 installed routers with 15.05 and the same number with 14.07 - I never see this problem on 14.07

comment:4 Changed 2 years ago by anonymous

After upgrading from 14.07 to 15.05 exactly described behavior on this device.
Never happened with 14.07.

comment:5 Changed 23 months ago by sailor_ca

I have several 841v8's & v9's (cc@48925) experiencing similar behaviour.

https://forum.openwrt.org/viewtopic.php?id=63419

Anything I can do to help please ask.

Last edited 23 months ago by sailor_ca (previous) (diff)

comment:6 Changed 23 months ago by anonymous

Is anyone working on this or is the current consensus to fall back to BB for now?

comment:7 Changed 23 months ago by rmilecki

There is some upstream report but it appears to be really ugly issue to debug. Maybe a board with (E)JTAG could help.
https://www.linux-mips.org/archives/linux-mips/2016-03/msg00311.html

comment:8 Changed 23 months ago by sailor_ca

The TP-Link 841v8 has JTAG....according to the wiki anyway. I have a v8 demonstrating the problem but it is currently in use. I have more 841's on the way and could swap that one out.

However I have never used JTAG. I assume you are looking for a log from the chip during the hang? I'll check into JTAG and see if it is something I can tackle.

comment:10 Changed 22 months ago by sailor_ca

I see CC has moved to 3.18.29 today and this kernel has the latest patch for a jffs2 locking problem.

commit 85d7c751f0deb51ab185b00868127a2f10a71a49
Author: David Woodhouse <David.Woodhouse@intel.com>
Date:   Mon Feb 1 12:37:20 2016 +0000

    jffs2: Fix page lock / f->sem deadlock
    
    [ Upstream commit 49e91e7079febe59a20ca885a87dd1c54240d0f1 ]
    
    With this fix, all code paths should now be obtaining the page lock before
    f->sem.
    
    Reported-by: Szabó Tamás <sztomi89@gmail.com>
    Tested-by: Thomas Betker <thomas.betker@rohde-schwarz.com>
    Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
    Cc: stable@vger.kernel.org
    Signed-off-by: Sasha Levin <sasha.levin@oracle.com>

I seem to remember this problem also causes the kernel to hang and was intermittent depending on free space in the jffs2 partition.

People may want to try a new CC build.

Last edited 22 months ago by sailor_ca (previous) (diff)

comment:11 Changed 22 months ago by sailor_ca

After about 10 successful reboots my 841v8 hung with a Mar 30 build of CC (linux 3.18.29) :(

comment:12 follow-up: Changed 22 months ago by rmilecki

Fixed by r49156

comment:13 Changed 22 months ago by anonymous

Do you know when this will be pushed to CC?

comment:14 Changed 22 months ago by sailor_ca

Nevermind. I just saw the backport patch on the dev list. Thanks!

comment:15 in reply to: ↑ 12 ; follow-up: Changed 22 months ago by sailor_ca

Replying to rmilecki:

Fixed by r49156

I applied the CC patch from the devel mailing list and have had no problem. But the patch has not show up in the CC git. I see it appeared in trunk some time ago. Is the delay due to a problem with the CC patch? Thanks

comment:16 in reply to: ↑ 15 Changed 22 months ago by rmilecki

  • Resolution set to fixed
  • Status changed from new to closed

Replying to sailor_ca:

I applied the CC patch from the devel mailing list and have had no problem. But the patch has not show up in the CC git. I see it appeared in trunk some time ago. Is the delay due to a problem with the CC patch? Thanks

Thanks, applied to 15.05 in r49202

Add Comment

Modify Ticket

Action
as closed .
The resolution will be deleted. Next status will be 'reopened'.
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.