Modify

Opened 4 years ago

Last modified 3 years ago

#14978 new defect

local PCI bus communication issues between the CPU and Wi-Fi card

Reported by: mario_lopes Owned by: developers
Priority: normal Milestone: Chaos Calmer 15.05
Component: kernel Version: Trunk
Keywords: Cc:

Description

Hello.

I have a x86 Alix 3D3 + Ubiquiti XR7 + MicroTik R52n-M.
I am running r39577 and on R52n-M card, configured on 5 GHz, ath9k driver, i get the following messages before traffic generator (iperf) node crashes:

  • dmesg:
[  224.078722] ath: phy0: Failed to stop TX DMA, queues=0x004!
[  224.108054] ath: phy0: DMA failed to stop in 10 ms AR_CR=0x00000024 AR_DIAG_SW=0x42000020 DMADBG_7=0x000062c0
[  224.137886] ath: phy0: Could not stop RX, we could be confusing the DMA engine when we start RX up
[  225.000589] ath: phy0: Failed to stop TX DMA, queues=0x004!
  • cat /sys/kernel/debug/ieee80211/phy0/ath9k/interrupt
                   RX:       1575
                RXEOL:          0
                RXORN:          0
                   TX:      11953
                TXURN:          0
                  MIB:          0
                RXPHY:          0
                RXKCM:          0
                 SWBA:       4110
                BMISS:          0
                  BNR:          0
                  CST:          0
                  GTT:          0
                  TIM:          0
               CABEND:          0
             DTIMSYNC:          0
                 DTIM:          0
               TSFOOR:        621
                  MCI:          0
             GENTIMER:          0
                TOTAL:      17651
SYNC_CAUSE stats:
             Sync-All:       9897
              RTC-IRQ:          0
              MAC-IRQ:          0
EEPROM-Illegal-Access:          0
          APB-Timeout:          0
    PCI-Mode-Conflict:          0
          HOST1-Fatal:          0
           HOST1-Perr:          0
       TRCV-FIFO-Perr:          0
          RADM-CPL-EP:          0
  RADM-CPL-DLLP-Abort:          0
   RADM-CPL-TLP-Abort:          0
    RADM-CPL-ECRC-Err:          0
     RADM-CPL-Timeout:          0
    Local-Bus-Timeout:       9897
            PM-Access:          0
            MAC-Awake:          0
           MAC-Asleep:          0
     MAC-Sleep-Access:          0

On 12.09 (r36088, pre-built) this problem does not occur.
This problem was reported on #11862 and on #9654, however, this issue could not be related to kernel module ath9k malfunction.

Thanks.

Attachments (0)

Change History (7)

comment:1 Changed 4 years ago by bittorf@…

how does iperf crash? segfault? is there a crashdump-file in /tmp/*.core?

comment:2 Changed 4 years ago by mario_lopes

Hi.
Re-tested today with r39782:
iperf runs well on 100Base-TX, on XR7 Wi-Fi or simultaneous on both.
The client version of iperf does not seem to crash, it stops generating traffic.
After doing 'CTRL-C' on iperf via ssh, 'TX DMA' messages continue to appear after that, 'Sync-All' and 'Local-Bus-Timeout' values continue to increase until system becomes inaccessible. By inaccessible I mean no ping reply from 3 interfaces, no HTTP (LuCI), no SSH and USB keyboard enter key makes no action.
On VGA output the last message is the 'TX DMA' related, not a 'Kernel Panic' or such.
After reboot, no *.core files where found on /tmp. Maybe they are on other directory mounted on permanent storage?
Also, running top comand, when traffic is being generated, CPU is 5%usr 20%sys...6%sirq. When traffic stops being generated, the sys value drops to 9%, the other values remain the same.

Thanks.

comment:3 Changed 4 years ago by mario_lopes

Issue still occurs at r40742, without XR7 card inserted, using only R52n-M.
This issue also occurs when using R52n-M on Gateworks GW2388-4 with r40742, configured on AP-STA mode, with "Local-Bus-Timeout" increasing and some re-associations from STA side.

Thanks.

comment:4 Changed 4 years ago by mario_lopes

At r41665 on Gateworks GW2388-4 with R52n-M, still the same reported issue but difference at /sys/kernel/debug/ieee80211/phy0/ath9k/interrupt

                   RX:    4208439
                RXEOL:          0
                RXORN:     782252
                   TX:    3553653
                TXURN:          0
                  MIB:          0
                RXPHY:          0
                RXKCM:          0
                 SWBA:     144649
                BMISS:          0
                  BNR:          0
                  CST:       2386
                  GTT:          0
                  TIM:          0
               CABEND:          0
             DTIMSYNC:          0
                 DTIM:          0
               TSFOOR:      35316
                  MCI:          0
             GENTIMER:          0
                TOTAL:    8050568
SYNC_CAUSE stats:
             Sync-All:      60197
              RTC-IRQ:          0
              MAC-IRQ:          0
EEPROM-Illegal-Access:          0
          APB-Timeout:          0
    PCI-Mode-Conflict:          0
          HOST1-Fatal:          1       <-----------------------------NEW!!!
           HOST1-Perr:          0
       TRCV-FIFO-Perr:          0
          RADM-CPL-EP:          0
  RADM-CPL-DLLP-Abort:          0
   RADM-CPL-TLP-Abort:          0
    RADM-CPL-ECRC-Err:          0
     RADM-CPL-Timeout:          0
    Local-Bus-Timeout:      60197
            PM-Access:          0
            MAC-Awake:          0
           MAC-Asleep:          0
     MAC-Sleep-Access:          0

Thanks.

comment:5 Changed 4 years ago by mario_lopes

At r41898 on Gateworks GW2388-4 with R52n-M, output of /sys/kernel/debug/ieee80211/phy0/ath9k/reset right before system hang:

root@OpenWrt:~# 
    Baseband Hang:  0
Baseband Watchdog:  0
   Fatal HW Error:  1
      TX HW error:  0
 Transmit timeout:  0
     TX Path Hang:  0
      PLL RX Hang:  0
         MAC Hang:  0
     Stuck Beacon: 19
        MCI Reset:  0
Calibration error:  0

Thanks.

comment:6 Changed 4 years ago by nbd

The gateworks GW2388 issue is a different one - it's a PCI clock issue, not an ath9k bug.
As for the x86 issue, please test current trunk

comment:7 Changed 3 years ago by mario_lopes

Hi.

Tested with r42840 on Alix x86 + R52n-M, Ad-Hoc mode, problem still exists.

output of /sys/kernel/debug/ieee80211/phy0/ath9k/reset before system crash, at iperf traffic generator node:

    Baseband Hang:  0
Baseband Watchdog:  0
   Fatal HW Error:  1
      TX HW error:  0
 Transmit timeout:  0
     TX Path Hang:  0
      PLL RX Hang:  0
         MAC Hang:  0
     Stuck Beacon: 17
        MCI Reset:  0
Calibration error:  0

Dmesg and /sys/kernel/debug/ieee80211/phy0/ath9k/interrupt shows the same behaviour as previously reported.

Thanks.

Add Comment

Modify Ticket

Action
as new .
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.