Modify

Opened 10 years ago

Closed 7 years ago

Last modified 4 years ago

#3782 closed defect (fixed)

DG834 v3: ethernet doesn't work

Reported by: narge-openwrt@… Owned by: florian
Priority: response-needed Milestone: Barrier Breaker 14.07
Component: kernel Version: Kamikaze trunk
Keywords: Cc:

Description

I have a DG834 v3 with r11844 installed.

On a warm reboot after flashing with upslug2, the ethernet device receives packets, but cannot send them. After attempting to send a small number of packets, cpmac starts printing "eth0: tx dma ring full", and all of the port activity lights come on and stay on. None of the sent packets appear on the wire, according to tcpdump on my laptop.

After a cold reboot, no packets are received at all, according to tcpdump on the router. Any attempt to send a packet appears to reset the switch: all four link lights blink once, and the link drops briefly before coming back up. Again, none of the sent packets appear on the wire. Nothing appears in dmesg.

This may be related to #3124, but I doubt it. The DG834Gv3 wiki page reports similar-sounding behaviour with r10180.

Attachments (3)

900-cpmac_init_hack.patch (1019 bytes) - added by narge-openwrt@… 9 years ago.
973-cpmac_scan_rework.patch (1.9 KB) - added by Wipster 7 years ago.
Change the scanning
973-cpmac_handle_mvswitch.c (1.3 KB) - added by Wipster 7 years ago.
Want to give that a go?

Download all attachments as: .zip

Change History (27)

comment:1 Changed 10 years ago by narge-openwrt@…

I also tried enabling the kernel's mvswitch driver, suspecting the switch wasn't being initialised properly.

After commenting out the code in mvswitch_config_init that waits for the ATU reset to complete (the ATU control register seems to always read 0xffff), and fixing cpmac to use the mvswitch's rx mangling, mvswitch appears to work correctly. The register values all look right, other than ATU control. However, as soon as vlan 0 is created and brought up, I see exactly the same symptoms as with mvswitch disabled.

comment:2 Changed 10 years ago by anonymous

Ok, I've got it working. It had nothing to do with mvswitch. The causes are that:

  • the ar7 platform code doesn't enable the MII (causing the failure on cold boots), and
  • cpmac resets and enables the internal PHY, which should stay disabled (causing the failure on warm reboots after flashing).

As far as I can tell, the former problem applies to all TNETD7200 devices with switches, but is hidden on the others because the bootloader sets up ethernet when it listens for tftp uploads. On the DG834v3, ADAM2 doesn't touch the ethernet device.

I'm not really sure why the latter problem hasn't broken other devices.

I have a simple hack that fixes this, which I will clean up and submit soon.

comment:3 Changed 9 years ago by anonymous

any news on this?

comment:4 Changed 9 years ago by nas@…

Narge,

Can you share your expirience of running openwrt with DG834v3? I've seen you patch in ticket 3124, but it doesn't help me at all but I get no ethernet connectivity ... and that puzzles me. Maybe there is something else needs to be done in order to have operational ethernet?

Changed 9 years ago by narge-openwrt@…

comment:5 Changed 9 years ago by narge-openwrt@…

Sorry about the delay; I was planning on working out the right way to do this so it would work on other devices, but never got around to it. So, here's the hack. It is quite likely to break devices other than the DG834v3.

900-cpmac_init_hack.patch sets the appropriate register bits to switch on the MII and switch off the internal PHY on the AR7. I'm still looking for a way to reliably detect whether this is the right thing to do.

comment:6 Changed 8 years ago by florian

  • Owner changed from developers to florian
  • Status changed from new to assigned

comment:7 Changed 8 years ago by florian

narge any updates on this ?

comment:8 Changed 8 years ago by spudz76

  • Milestone set to Kamikaze
  • Priority changed from normal to response-needed
  • Version set to Kamikaze trunk

[patchteam] Setting priority to response-needed pending feedback on this issue.

comment:9 Changed 7 years ago by florian

  • Resolution set to fixed
  • Status changed from accepted to closed

Fixed with r22771.

comment:10 follow-up: Changed 7 years ago by narge-openwrt@…

  • Resolution fixed deleted
  • Status changed from closed to reopened

Sorry, but r22771 does not work on my DG834v3. I'll describe what I think is going on, so we can try to figure out a proper solution.

There now two loops in cpmac_probe() that perform exactly the same checks on the phy_mask, without ever modifying it. Note that phy_mask is only set in cpmac_init(), and needs to be set before mdiobus_register() is called.

The result is that on any device with a working MDIO other than the DG834v3, phy_mask will be set correctly for the first loop (because the bootloader has disabled the internal phy before cpmac_init() if necessary), and cpmac_probe() will detect the PHY. Devices with a fixed phy will fall through to the fixed phy code because phy_mask is zero, and the extra ar7_device_disable() won't hurt because their internal PHY is already off. Thus any non-DG834v3 device will work.

On the DG834v3, the bootloader has not configured the cpmac, and it will be in the reset state with the internal phy on. The external switch appears on the MDIO (even though the MII is not attached to it), so cpmac_probe() will select a phy in the first loop. If the mvswitch driver is enabled, it fails because the switch has an MDIO address conflict with the internal PHY; if the switch driver is disabled, cpmac starts but is unable to communicate because the internal PHY is connected to the switch's MII port.

The best solution I can think of is to check during cpmac_init() to see whether the internal PHY is enabled. If so, we can disable it, reset and probe the MDIO, and then re-enable the internal PHY if nothing responds. I have a patch to do this which gets the PHY configuration correct on the DG834 v3, but currently crashes later on; I'll post it here when it is debugged.

Apologies for my slow responses, btw, but this device was my only working modem for years, so I didn't want to risk bricking it again. I have another one now :)

comment:11 in reply to: ↑ 10 Changed 7 years ago by anonymous

Replying to narge-openwrt@…:
Thus any non-DG834v3 device will work.

The switch is not working on my DG834Gv2 with 10.03.1-rc3. Worked fine with rc2.
Ticket #7836 opened to cover.

comment:12 follow-up: Changed 7 years ago by florian

  • Status changed from reopened to accepted

narge, I am glad you commented on this issue, and that you have a marvell switch attached to your device. In order to get the mvswitch driver to work with your device, you will also have to let it probe up to PHY addresse 16, which is what the driver expects to be the CPU port.

About the disabling of the internal PHY and probing, I tend to agree with your solution and wait for patch to see how it goes.

comment:13 Changed 7 years ago by Wipster

Is it worth taking this into platform for board / model specific stuff and only bringing up one or the other interface not both, or are there cases where both are wired up?

comment:14 in reply to: ↑ 12 Changed 7 years ago by anonymous

Replying to anonymous:

The switch is not working on my DG834Gv2 with 10.03.1-rc3. Worked fine with rc2.
Ticket #7836 opened to cover.

Yeah, I'm now having problems with the mvswitch driver which was enabled in r22727, so I might be seeing that as well. When I said non-DG834v3 devices should work I only meant with respect to enabling or disabling the internal PHY & MII.

Replying to florian:

In order to get the mvswitch driver to work with your device, you will also have to let it probe up to PHY addresse 16, which is what the driver expects to be the CPU port.

The probe seems to find the mvswitch already, since address 16 is the first one that responds.

About the disabling of the internal PHY and probing, I tend to agree with your solution and wait for patch to see how it goes.

Ok, I'll have another go at getting it to work this evening. The crash I was seeing was a bug in the mvswitch driver; that is now fixed, but I still get mangled packets from eth0.

Changed 7 years ago by Wipster

Change the scanning

comment:15 follow-ups: Changed 7 years ago by Wipster

Want to give that a try? apply it before trunk r22849.
I moved the MII select out to platform because the remap depends on not being a 7300 and not really cpmac driver domain imo.
It first looks for the external phy to make the right mask, failing that it turns off ephy and looks again.

For me this brings up my external and correctly tells me there is a switch there not just a link, but when platform comes to attach lowcpmac the mask generated hides it (doesn't matter because working link).
I think the right thing to do is to move the cpmac high and low registration from platform to cpmac, so it looks on external if it finds something only bring up highcpmac and if falls through to internal it brings up lowcpmac. Thoughts?

comment:16 in reply to: ↑ 15 Changed 7 years ago by Wipster

Replying to Wipster:

I think the right thing to do is to move the cpmac high and low registration from platform to cpmac, so it looks on external if it finds something only bring up highcpmac and if falls through to internal it brings up lowcpmac. Thoughts?

Actually thats probably best left in platform too, just edit that patch to include this before the id allocation. And its fine, really should try and get my hands on a DG834v3 be interesting to poke around.

cpmac_mii->phy_mask = ar7_is_titan()? ~(mask | 0x80000000 | 0x40000000):
        ~(mask | 0x80000000);

comment:17 in reply to: ↑ 15 ; follow-up: Changed 7 years ago by narge

Replying to Wipster:

Want to give that a try? apply it before trunk r22849.

I can't test it until I get home tonight, but:

//No externals found try internal 
ar7_device_disable(AR7_RESET_BIT_EPHY); 

This is backwards — ephy is the internal phy.

My solution is only correct if the ephy doesn't respond on the mdio when it's disabled, and does respond when it is enabled. I can't test that, though, because the ephy and switch both use phy address 31 on my router, so CPMAC_MDIO_ALIVE always has bit 31 set.

So, it would be helpful to know the value of CPMAC_MDIO_ALIVE about 20ms after an mdio reset, with the ephy disabled, on a board that doesn't have a switch at addresses 16–31. Can someone with such a device check this please?

I moved the MII select out to platform because the remap depends on not being a 7300 and not really cpmac driver domain imo.

Yeah, I think all of the above is only applicable to 7100 and 7200. I'm not sure whether it is ok to enable mii when the ephy is being used, though.

comment:18 in reply to: ↑ 17 ; follow-up: Changed 7 years ago by anonymous

Replying to narge:

I can't test that, though, because the ephy and switch both use phy address 31 on my router, so CPMAC_MDIO_ALIVE always has bit 31 set.

Doesn't the 6060 use PHYid 16? or is there the 6063 on that board, or other.
How did you get on with your evenings testing?

comment:19 in reply to: ↑ 18 ; follow-up: Changed 7 years ago by narge

Replying to anonymous:

Doesn't the 6060 use PHYid 16? or is there the 6063 on that board, or other.

It's a 6060, which uses every address between 16 and 31.

How did you get on with your evenings testing?

What I see at the moment is that received packets are missing the first 16 bytes (vlan tag, source and destination mac addresses). The only header bytes left are the ethertype. Sent packets don't appear on the wire. Disabling the switch driver didn't seem to help, either.

comment:20 in reply to: ↑ 19 Changed 7 years ago by Wipster

Replying to narge:

What I see at the moment is that received packets are missing the first 16 bytes (vlan tag, source and destination mac addresses). The only header bytes left are the ethertype. Sent packets don't appear on the wire. Disabling the switch driver didn't seem to help, either.

Try changing the tagging from header to trailer? From a chat with florian it cpmac needs a bit of work to allow for the mangled packets that this switch gives. Maybe if you just do trailer the packet will get to where it needs to at least.

Changed 7 years ago by Wipster

Want to give that a go?

comment:21 Changed 7 years ago by Wipster

Narge, have you had any progress with bug hunting. Are you still seeing the same problem with the offset patch?

comment:22 Changed 7 years ago by florian

Wipser, your patch looks okay at first glance, would be great if someone could test it.

comment:23 Changed 7 years ago by florian

  • Resolution set to fixed
  • Status changed from accepted to closed

Fixed with r24142.

comment:24 Changed 4 years ago by jow

  • Milestone changed from Attitude Adjustment 12.09 to Barrier Breaker 14.07

Milestone Attitude Adjustment 12.09 deleted

Add Comment

Modify Ticket

Action
as closed .
The resolution will be deleted. Next status will be 'reopened'.
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.