Modify

Opened 12 years ago

Closed 11 years ago

Last modified 4 years ago

#464 closed defect (fixed)

ath_pci causes kernel oops in build 3598

Reported by: roy@… Owned by: developers
Priority: high Milestone: Barrier Breaker 14.07
Component: packages Version:
Keywords: Cc:

Description

From a fresh 'svn up' to build 3598, I'm getting a kernel oops when loading ath_pci. (build 3596 just spewed a blizzard of 'Undefined symbol' errors, so I guess this is progress? :)

 ath_pci: 0.9.4.5 (svn 1486)     
 PCI: Enabling device 0000:01:01.0 (0000 -> 0002)
 PCI: Fixing up device 0000:01:01.0              
 Data bus error, epc == c00e1358, ra == c00e1524
 Oops[#1]:                                      
 Cpu 0    
 $ 0   : 00000000 10009c00 c00e1330 81357c60
 $ 4   : c0080000 00000001 00000001 00000003
 $ 8   : ffff0000 81da28ba 00000010 00000000
 $12   : 00000000 fffffffe 00000010 00000000
 $16   : 00000001 81da0000 81da0000 81da0000
 $20   : 81da0000 c0115dd8 8134bbd8 80220000
 $24   : 00000010 800f86a0                  
 $28   : 8134a000 8134bb18 81c68000 c00e1524
 Hi    : 00000000                           
 Lo    : 00000000
 epc   : c00e1358     Tainted: P     
 ra    : c00e1524 Status: 10009c03    KERNEL EXL IE 
 Cause : 0000001c                                   
 PrId  : 00029007
 Modules linked in: ath_pci ath_rate_sample ath_hal wlan_scan_sta wlan_scan_ap wlan switch_robo  switch_core
 Process insmod (pid: 605, threadinfo=8134a000, task=8033d000)                                             
 Stack : 80220000 80220000 80033b54 81076f10 00000001 00000000 81da0000 81da0000
        81da0000 c00e1524 80033b54 81076f10 81c68260 c0080000 80033b54 81076f10
        81c68260 c00ddee0 80260000 81357ca0 00000400 800b37bc 8134bbd8 00000008
        10009c00 8134bc18 80033b54 81076f10 81c68260 c0080000 40000000 c0115dd8
        80220000 c00cd72c 0000000c 8134bbd0 00000001 80050038 8134bbd8 80100dac
        ...                                                                    
 Call Trace: [<80033b54>]  [<c00e1524>]  [<80033b54>]  [<80033b54>]  [<c00ddee0>]  [<800b37bc>]  [<80033b54>]  [<c00cd72c>]  [<80050038>]  [<80100dac>]  [<c00cd020>]  [<c01114f0>]   
                                                                                                                                                                                    
 Code: 10a00028  00808825  8e240014 <8c824004> 3c03fffc  3463ffff  00431024  ac824004  3c02c00d 

Attachments (1)

196-reset_pcicore.patch (334 bytes) - added by jhansen@… 11 years ago.
Line that should be removed

Download all attachments as: .zip

Change History (8)

comment:1 Changed 12 years ago by Flyashi

I've had problems with kernel oops's and undefined errors. Try unplugging it an letting it sit for a while; also, I had one WGT that I thought had a hardware problem with the Atheros card: it would either reset on insmod of ath_pci, or claim to work but not list the device, or a slew of problems. Reflashed with 3586, and it works.

So: Try unplugging and letting it sit, if not, try reflashing. If both of those fail, try make clean; make, and reflash with that.

Good luck.

comment:2 Changed 12 years ago by kaloz

  • Milestone set to 2.0

comment:3 Changed 12 years ago by roy@…

Well, I up'd to 3704, make clean, make, reflash and no joy. Same result as above. I suspect the WGT may be at fault. (it was a refurb, and was inop out of the box, actually... wouldn't boot because the filesystem was frobbed)

comment:4 Changed 12 years ago by nbd

  • Resolution set to worksforme
  • Status changed from new to closed

comment:5 Changed 11 years ago by jhansen@…

  • Resolution worksforme deleted
  • Status changed from closed to reopened

This problem still happens to this day, depending on the phase of the moon, etc.

After many, many hours of debugging, I have determined what does *not* fix the problem:

  • Matching up the LATENCY_TIMER of the PCI bridge and the Atheros radio (though it does help a little).
  • Playing with MIN_GNT and MAX_LAT don't help.
  • Inserting delays here and there don't help.
  • Using readl/writel instead of direct read/writes doesn't help.
  • Using get_dbe does not work, because the entire core of the 947xx CPU actually becomes unstable when this problem occurs. Instead of catching the data bus error properly, the CPU resets itself :(

I have, in fact, found what *does* fix the problem:

The PCI host controller core in the CPU becomes unstable after a CPU warm reset. So, you need to reset the PCI core by commenting out the line that prevents the core from being reset if it is already enabled. I have submitted a patch that shows which line I feel should just be yanked out of the tree.

Changed 11 years ago by jhansen@…

Line that should be removed

comment:6 Changed 11 years ago by nbd

  • Resolution set to fixed
  • Status changed from reopened to closed

fix added in [6868]

comment:7 Changed 4 years ago by jow

  • Milestone changed from Attitude Adjustment 12.09 to Barrier Breaker 14.07

Milestone Attitude Adjustment 12.09 deleted

Add Comment

Modify Ticket

Action
as closed .
The resolution will be deleted. Next status will be 'reopened'.
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.