Modify

Opened 5 years ago

Last modified 3 years ago

#12682 reopened defect

WRT350nv1 BCM47XX Data bus error kernel oops on bootup

Reported by: Bill Farrow Owned by: hauke
Priority: normal Milestone: Barrier Breaker 14.07
Component: kernel Version: Trunk
Keywords: WRT350nv1 BCM47XX ssb pci Cc:

Description

The WRT350nv1 firmware image from 12.09-rc1 as well as the latest code from openwrt trunk cause the same crash in the ssb driver.

[    0.708000] PCI: Fixing up device 0000:00:00.0
[    0.744000] Data bus error, epc == 8025ae88, ra == 8025ae10
[    0.744000] Oops[#1]:

After enabling the kernel symbol tables and adding some printk's I have narrowed down the problem to this line in ssb_bus_scan():

  idhi = scan_read32(bus, 0, SSB_IDHIGH);

Attachments (4)

bootlog_wrt350nv1_ssb_crash.txt (8.3 KB) - added by Bill Farrow 5 years ago.
bootlog_wrt350nv1_kamikaze.txt (12.7 KB) - added by Bill Farrow 5 years ago.
successful boot log with kernel 2.4.35.4
bootlog_wrt350nv1_r29923.txt (15.0 KB) - added by Bill Farrow <bill-openwrt@…> 5 years ago.
Boot log with SVN r29923 without tg3 or ssb-gige enabled
ssb_pcicore_read_config-data-bus-error.txt (5.3 KB) - added by rmilecki 3 years ago.
Log of another "Data bus error" (not SSB_IDHIGH related)

Download all attachments as: .zip

Change History (20)

Changed 5 years ago by Bill Farrow

comment:1 Changed 5 years ago by hauke

  • Owner changed from developers to hauke
  • Status changed from new to accepted

There is probably something wrong in the CardBus code.

Did this ever worked with an older OpenWrt version using kernel 2.6.X?

comment:2 Changed 5 years ago by anonymous

This router does run a version of OpenWrt that I built from svn (13783) in 2008. The kernel was linux-brcm-2.4/linux-2.4.35.4

I loaded stock Backfire (2.6.32.27) on this unit and it booted to a command prompt, but without any network devices at all. Maybe 2.6.x and 3.x kernel support needs some work.

When running the Attitude Adjustment kernel 3.6.11, I think the ssb_bus_scan() function is attempting to read an invalid memio address for the PCI bus.

Changed 5 years ago by Bill Farrow

successful boot log with kernel 2.4.35.4

comment:3 Changed 5 years ago by Bill Farrow

I have started reading the bcm reversion engineering specs:

http://bcm-v4.sipsolutions.net/

The kernel successfully scans the ssb bus early on in the boot process. It identifies the ssb-pci bridge (0x804) core:

[ 0.000000] scan_read32() line 168 bustype 0 mmio B8000000 coreidx 04 offset 0FFC,
[ 0.000000] scan_read32() line 191 read B8004FFC value 4243804B,

The crash occurs when it attempts to scan the pci bus, behind the ssb-pci bridge, for more ssb cores. The only device on the PCI bus is the wifi card (0x14e4:0x4329).

How can I disable CONFIG_SSB_PCIHOST to stop the PCI bus from being scanned for ssb devices/cores ?

comment:4 Changed 5 years ago by hauke

Could you post the content of your nvram. (remove the secret keys from the output)

Is it correct that this is a BCM4705/BCM4785 with two wifi cards (simultaneously dual band)?
Does wifi work with backfire kernel 2.4?

You could look into the source code provided by our hardware vendor, it should contain the pci code.

comment:5 Changed 5 years ago by Bill Farrow

The WRT350Nv1 has a BCM4785 CPU and BCM4329 pcmcia wifi card (b/g/n) - I think it is dual band.

The Backfire 10.03.1 brcm-2.4 kernel boots but ethernet and wifi drivers are missing or broken. No network devices show up under "ifconfig -a".

wl0: wlc_attach: failed with err 13

NVRAM

CFE> nvram show
CMD: [nvram show]
opo=0x0
os_ram_addr=80001000
enabled_5397=1
boardrev=0x10
bootnv_ver=5
et0macaddr=<removed>
watchdog=5000
boot_wait=on
et0mdcport=0
reset_gpio=7
pmon_ver=CFE 4.81.53.0
vlan2ports=0 8
os_flash_addr=bfc40000
sromrev=2
boardtype=0x478
et1macaddr=<removed>
lan_netmask=255.255.255.0
et1mdcport=0
parkid=0
vlan2hwname=et0
wl0gpio0=8
boardflags2=0
wait_time=3
lan_ipaddr=192.168.1.1
clkfreq=300,150,37
vlan1hwname=et0
sdram_config=0x0062
vlan1ports=1 2 3 4 8*
scratch=a0180000
eou_private_key=<removed>
boardflags=0x110
sdram_refresh=0x8040
wandevs=vlan2
sdram_ncdl=0xff0307
et0phyaddr=30
landevs=vlan1 wl0
sdram_init=0x0009
dl_ram_addr=a0001000
cardbus=1
et1phyaddr=4
boot_ver=v4.2
boardnum=42
eou_public_key=<removed>
size: 1010 bytes (31758 left)
*** command status = 0

comment:6 Changed 5 years ago by Bill Farrow

Tried changing the nvram cardbus setting to 0. This causes the kernel to hard reboot after the line "ssb: PCIcore in host mode found".

Tried SVN r29923 build - kernel boots and runs, but there are no ethernet or wifi network interfaces. The ssb bus is probed and 6 ssb devices are displayed in sysfs.

root@OpenWrt:/# ls /sys/bus/ssb/devices/
ssb0:0  ssb0:1  ssb0:2  ssb0:3  ssb0:4  ssb0:5  ssb0:6

Info from /sys/bus/ssb/devices/

Device   coreid   name
ssb0:0 = 0x081f = GBit Ethernet
ssb0:1 = 0x0819 = USB 2.0 Host
ssb0:2 = 0x081a = USB 2.0 Device
ssb0:3 = 0x081d = PATA
ssb0:4 = 0x080b = IPSEC
ssb0:5 = 0x080f = MEMC SDRAM
ssb0:6 = 0x081e = SATA XOR-DMA

lspci

00:00.0 Class 0280: 14e4:4329  (BCM4321 802.11b/g/n)
01:00.0 Class 0200: 14e4:1676  (NetXtreme BCM5750 Gigabit Ethernet)

Using pciutils package (/usr/sbin/lspci -v)

00:00.0 Network controller: Broadcom Corporation BCM4321 802.11b/g/n (rev 01)
        Subsystem: Broadcom Corporation Device 046d
        Flags: bus master, fast devsel, latency 168, IRQ 6
        Memory at 40000000 (32-bit, non-prefetchable) [size=16K]

01:00.0 Ethernet controller: Broadcom Corporation Device 1676
        Subsystem: Broadcom Corporation Device 1676
        Flags: medium devsel, IRQ 4
        Memory at 18010000 (64-bit, non-prefetchable) [size=64K]
        Capabilities: [48] Power Management version 2
        Capabilities: [50] Vital Product Data
        Capabilities: [58] MSI: Enable- Count=1/8 Maskable- 64bit+

Tried enabling the Broadcom Tigon3 (tg3) driver as well as the Broadcom SSB Gigabit Ethernet (ssb-gige) driver in the kernel from r29923. The tg3 reports a failure when to probing the PHY.

tg3.c:v3.119 (May 18, 2011)
tg3 0000:01:00.0: phy probe failed, err -19
tg3 0000:01:00.0: Problem fetching invariants of chip, aborting

comment:7 Changed 5 years ago by hauke

Nice debug output, could you post the full boot log?

Did you tried to deactivated initialization of the pci (cardbus) wlan card when booting r29923 ?
If this is r29923 without any local modification, then some change later broke your device.

The problem you have with the Ethernet driver (tg3) is fixed in a more recent version.

Changed 5 years ago by Bill Farrow <bill-openwrt@…>

Boot log with SVN r29923 without tg3 or ssb-gige enabled

comment:8 Changed 5 years ago by Bill Farrow <bill.farrow@…>

I have reverted my build tree to the development head, and I will start debugging the code from there.

comment:9 Changed 5 years ago by florian

  • Status changed from accepted to assigned

comment:10 Changed 4 years ago by jow

  • Milestone changed from Attitude Adjustment 12.09 to Barrier Breaker 14.07

Milestone Attitude Adjustment 12.09 deleted

comment:11 Changed 3 years ago by rmilecki

Hi, I would like to bring back some focus to this bug.

I own a WRT300N v1.0 which is a very similar device. It's based on BCM4704 instead of BCM4705 but also has CardBus BCM4321. I'm also getting Data bus error during the first MMIO read: idhi = scan_read32(bus, 0, SSB_IDHIGH);

I believe the key to understand this bug is analyze of PCI resources. Following log comes from my WRT300N v1.0 using kernel 3.18.10:

[    0.980000] pci 0000:00:00.0: BAR 1: assigned [mem 0x40000000-0x47ffffff pref]
[    0.990000] pci 0000:00:01.0: BAR 0: assigned [mem 0x48000000-0x48003fff]
[    1.010000] pci 0000:00:00.0: BAR 0: assigned [mem 0x48004000-0x48005fff]

Assigning resources is done by setting PCI config registers PCI_BASE_ADDRESS_[0-5]. So I decided to:

  1. Print info about every write to config register
  2. Dump resources configuration after every update
[ssb_pciecore_dump][bus:0 dev:0 func:0 off:0x10] 0x00000000
[ssb_pciecore_dump][bus:0 dev:0 func:0 off:0x14] 0x00000008
[ssb_pciecore_dump][bus:0 dev:1 func:0 off:0x10] 0x00000000

pci 0000:00:00.0: BAR 1: assigned [mem 0x40000000-0x47ffffff pref]
[ssb_extpci_write_config] bus:0 dev:0 func:0 off:0x14 addr:0x0c010014 val:0x40000008
[ssb_pciecore_dump][bus:0 dev:0 func:0 off:0x10] 0x00000000
[ssb_pciecore_dump][bus:0 dev:0 func:0 off:0x14] 0x40000008
[ssb_pciecore_dump][bus:0 dev:1 func:0 off:0x10] 0x00000000

pci 0000:00:01.0: BAR 0: assigned [mem 0x48000000-0x48003fff]
[ssb_extpci_write_config] bus:0 dev:1 func:0 off:0x10 addr:0x0c020010 val:0x48000000
[ssb_pciecore_dump][bus:0 dev:0 func:0 off:0x10] 0x48000000
[ssb_pciecore_dump][bus:0 dev:0 func:0 off:0x14] 0x40000008
[ssb_pciecore_dump][bus:0 dev:1 func:0 off:0x10] 0x48000000

pci 0000:00:00.0: BAR 0: assigned [mem 0x48004000-0x48005fff]
[ssb_extpci_write_config] bus:0 dev:0 func:0 off:0x10 addr:0x0c010010 val:0x48004000
[ssb_pciecore_dump][bus:0 dev:0 func:0 off:0x10] 0x48004000
[ssb_pciecore_dump][bus:0 dev:0 func:0 off:0x14] 0x40000008
[ssb_pciecore_dump][bus:0 dev:1 func:0 off:0x10] 0x48004000

You should notice from above log, that assigning different resources for 0000:00:00.0 and 0000:00:01.0 doesn't work. In my case it results in 0000:00:01.0 (device with SSB bus, chipset 0x4321) having wrong resource assigned. It seems when using CardBus only a one device (0000:00:01.0?) can be configured.

Unfortunately I'm not sure how to solve this. I guess we should somehow tell kernel to don't assign any resources to the 0000:00:00.0 (bridge) device.

Some slightly-related commit: ssb: fix cardbus slot in hostmode.

Last edited 3 years ago by rmilecki (previous) (diff)

comment:12 Changed 3 years ago by rmilecki

This should be fixed by r45308

Tested on WRT300N v1

comment:13 Changed 3 years ago by rmilecki

  • Resolution set to fixed
  • Status changed from assigned to closed

comment:14 Changed 3 years ago by danielg4

  • Resolution fixed deleted
  • Status changed from closed to reopened

comment:15 Changed 3 years ago by nbd

  • Resolution set to fixed
  • Status changed from reopened to closed

reopen without comment? srsly?

comment:16 Changed 3 years ago by danielg4

  • Resolution fixed deleted
  • Status changed from closed to reopened

Comment is: The fix for the WRT300N does not fix the WRT350N.

Changed 3 years ago by rmilecki

Log of another "Data bus error" (not SSB_IDHIGH related)

Add Comment

Modify Ticket

Action
as reopened .
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.