Modify

Opened 7 years ago

Closed 7 years ago

Last modified 4 years ago

#7738 closed defect (fixed)

DIR-825 B1/ rtl8366s/ port 5 Dot3StatsFCSErrors/EtherStatsDropEvents

Reported by: ghatothkach <ghatothkach@…> Owned by: juhosg
Priority: normal Milestone: Backfire 10.03.1
Component: kernel Version: Trunk
Keywords: Cc:

Description

hi

I am using the latest version as of today

KAMIKAZE (bleeding edge, r22582) ------------------

I notice

EtherStatsDropEvents                : 3880
Dot3StatsFCSErrors                  : 3880

when I do a

swconfig dev rtl8366s port 5 show

I am not sure if this is related to the LAN<>Wireless throughput
I am seeing (https://forum.openwrt.org/viewtopic.php?id=24904)
but there seems to be some defect in the driver or something for
rtl8366s.

Ghat

Attachments (0)

Change History (22)

comment:1 Changed 7 years ago by anonymous

I have exactly the same issue with same hardware.

EtherStatsDropEvents and Dot3StatsFCSErrors on port 5 (and only port 5).

Speed between WAN and LAN is very very slow (linked to those errors physical errors, i think so)

seen at least on OpenWRT 10.03 and 10.03.1-RC1 releases including lastest trunk release r22704.

comment:2 Changed 7 years ago by anonymous

Same problem observed here, verified also on 10.03 and 10.03.1-rc1, on a DIR-825 B2.

comment:3 Changed 7 years ago by atakama

Confirmed on Backfire trunk (r23109) on a DIR-825 B2. This affects all traffic that goes through the switch <-> CPU link, i.e. LAN-WAN, LAN-WLAN; other connections(LAN-LAN, WLAN-WAN) are not affected.
I see the OP has returned his device since posting the ticket, but there are still users out there affected by this. Any way we could help? Perhaps post some more info/outputs?
Thank you!

comment:4 Changed 7 years ago by jow

  • Owner changed from developers to juhosg
  • Status changed from new to assigned

comment:5 Changed 7 years ago by Jérôme Poulin <jeromepoulin@…>

Just to be sure I receive notification if this tickets changes instead of #7988.

comment:6 Changed 7 years ago by eric@…

Same problem observed with two DIR-825 B1 running 10.03 release.

comment:7 Changed 7 years ago by anonymous

r24191. Problem still exists. Is any workaround for this problem?

comment:8 Changed 7 years ago by good.win.alexs@…

Any news? Maybe you need any additional info?
P.S. DIR-825 B2, same problems on OpenWRT and DD-wrt, no problems on stock f/w

comment:9 Changed 7 years ago by Brian J. Murrell <brian@…>

Indeed, I am also seeing this problem on my B1 DIR-825. I can only get about 3Mb/s of the 15Mb/s or so that my WAN connection should have available.

That's both from a LAN connected machine and a WLAN (radio0) connected machine.

comment:10 follow-up: Changed 7 years ago by Johannes W <devnull.openwrt@…>

Just wanted to inform you that I (luckily) don't seem to have this problem with r24915. I never tried any previous versions on this device. Both counters stay at zero after now 20h of usage and I have full (6 MBit/s) wan<->wireless speed. I upgraded the original firmware to 202EUb04 before installing openwrt.

Hardware: D-Link DIR-825 B2 (shown as B1)

comment:11 in reply to: ↑ 10 ; follow-up: Changed 7 years ago by tsaarni

Replying to Johannes W <devnull.openwrt@…>:

Just wanted to inform you that I (luckily) don't seem to have this problem with r24915. I never tried any previous versions on this device. Both counters stay at zero after now 20h of usage and I have full (6 MBit/s) wan<->wireless speed

Like stated above by atakama the problem affects all traffic that goes through the switch <-> CPU link, i.e. LAN-WAN, LAN-WLAN, practically making wired part of the router completely unusable with openwrt as speed decreases to almost halt.

wan<->wireless traffic has never been affected by this bug.

comment:12 in reply to: ↑ 11 Changed 7 years ago by Johannes W <devnull.openwrt@…>

Replying to tsaarni:

Like stated above by atakama the problem affects all traffic that goes through the switch <-> CPU link, i.e. LAN-WAN, LAN-WLAN, practically making wired part of the router completely unusable with openwrt as speed decreases to almost halt.

wan<->wireless traffic has never been affected by this bug.

Sorry, I didn't read well enough. I tested WAN-LAN, but I get still full 6 MBit/s (speedtest.net), although now the counters are rising *very* slowly (both 5 after several attempts). Is it possible that the problem has been fixed in the current trunk?

comment:13 Changed 7 years ago by Brian J. Murrell <brian@…>

Well, I am testing r25047 and it's even worse than backfire-rc4.

It is worth noting that the slowness is only in one direction. Here's my testbed:

                     +----------+
 +------------+      | DIR-825  |     +----+
 | DHCP server|------+wan    lan+-----| PC |
 +------------+      +----------+     +----+

DHCP server's IP address is 10.254.239.1.

DIR-825's WAN address is 10.254.239.20 (although irrelevant to the tests).

DIR-825's LAN address is 10.75.22.196 (also irrelevant to the tests).

PC's IP address is 10.75.22.1.

PC:$ ssh root@10.254.239.1 "dd if=/dev/zero bs=1M count=100 2>/dev/null" | dd of=/dev/null bs=1M
0+3210 records in
0+3210 records out
104857600 bytes (105 MB) copied, 66.5463 s, 1.6 MB/s

Quite pitiful. And during that time on the DIR-825:

Immediately before the above data xfer:

root@new-gw:~# swconfig dev rtl8366s port 5 show | grep -e EtherStatsDropEvents -e Dot3StatsFCSErrors
EtherStatsDropEvents                : 14346
Dot3StatsFCSErrors                  : 14346

And immediately after:

root@new-gw:~# swconfig dev rtl8366s port 5 show | grep -e EtherStatsDropEvents -e Dot3StatsFCSErrors
EtherStatsDropEvents                : 15986
Dot3StatsFCSErrors                  : 15986

So, 1640 errors during that xfer.

Now if we look at data moving in the opposite direction:

$ dd if=/dev/zero bs=1M count=100 | ssh root@10.254.239.1 "dd of=/dev/null bs=1M 2>/dev/null"
100+0 records in
100+0 records out
104857600 bytes (105 MB) copied, 5.59242 s, 18.7 MB/s

We can see that it's quite a bit faster! The corresponding before and after samples of the errors:

root@new-gw:~# swconfig dev rtl8366s port 5 show | grep -e EtherStatsDropEvents -e Dot3StatsFCSErrors
EtherStatsDropEvents                : 16146
Dot3StatsFCSErrors                  : 16146

And:

root@new-gw:~# swconfig dev rtl8366s port 5 show | grep -e EtherStatsDropEvents -e Dot3StatsFCSErrors
EtherStatsDropEvents                : 16157
Dot3StatsFCSErrors                  : 16157

What is entirely unfortunate is that the asynchronous bandwidth of the WAN port is opposite to what most consumer broadband connections are. That is, the WAN port performs much better at "uploads" than it does "downloads".

In fact if this problem were inverse on these routers, I doubt anyone would have discovered it (so quickly). :-)

This gets even more interesting though. Trying to confirm the above results using a bandwidth testing tool like nttcp I cannot reproduce the results. This should replicate the first result above, the really slow one:

PC:$ ssh root@@10.254.239.1 "nttcp -t -T -n $((1024*100)) 10.75.22.1"
     Bytes  Real s   CPU s Real-MBit/s  CPU-MBit/s   Calls  Real-C/s   CPU-C/s
l419430400   15.03   13.31    223.2163    252.1377  102400   6812.02    7694.6
1419430400   15.03    2.34    223.2202   1433.8606  139793   9299.70   59736.9

But you can see that it does not. That's 27.9MB/s! The error counter samples during that run:

root@new-gw:~# swconfig dev rtl8366s port 5 show | grep -e EtherStatsDropEvents -e Dot3StatsFCSErrors
EtherStatsDropEvents                : 16390
Dot3StatsFCSErrors                  : 16390

And:

root@new-gw:~# swconfig dev rtl8366s port 5 show | grep -e EtherStatsDropEvents -e Dot3StatsFCSErrors
EtherStatsDropEvents                : 16401
Dot3StatsFCSErrors                  : 16401

Which is only 11 errors.

Now the opposite direction:

PC:$ nttcp -t -T -n $((1024*100)) 10.254.239.1
     Bytes  Real s   CPU s Real-MBit/s  CPU-MBit/s   Calls  Real-C/s   CPU-C/s
l419430400   11.57    0.68    289.9673   4963.3723  102400   8849.10  151470.1
1419430400   11.57    6.95    289.9133    482.8015  107170   9259.58   15420.3

which is 36.2MB/s! And the error counters before:

root@new-gw:~# swconfig dev rtl8366s port 5 show | grep -e EtherStatsDropEvents -e Dot3StatsFCSErrors
EtherStatsDropEvents                : 16401
Dot3StatsFCSErrors                  : 16401

And after:

root@new-gw:~# swconfig dev rtl8366s port 5 show | grep -e EtherStatsDropEvents -e Dot3StatsFCSErrors
EtherStatsDropEvents                : 16471
Dot3StatsFCSErrors                  : 16471

which is only 70 errors.

So what's the difference between these two tests? AFAIU, they should essentially be doing the same thing.

comment:14 Changed 7 years ago by yatakama

See https://forum.openwrt.org/viewtopic.php?pid=126410#p126410 for a possible fix by forum user masa and also #7988 which is related to this issue.

comment:15 follow-up: Changed 7 years ago by yatakama

This issue was fixed in r25121.

comment:16 in reply to: ↑ 15 Changed 7 years ago by Brian J. Murrell <brian@…>

Replying to yatakama:

This issue was fixed in r25121.

It's better, but really not where it should be I think (although, admittedly, I have not run the stock D-Link firmware through these tests to have any baseline/control values):

PC:$ ssh root@10.254.239.1 "dd if=/dev/zero bs=1M count=100 2>/dev/null" | dd of=/dev/null bs=1M
0+3222 records in
0+3222 records out
104857600 bytes (105 MB) copied, 16.445 s, 6.4 MB/s

PC:$ ssh root@10.254.239.1 "dd if=/dev/zero bs=1M count=100 2>/dev/null" | dd of=/dev/null bs=1M
0+3219 records in
0+3219 records out
104857600 bytes (105 MB) copied, 14.6932 s, 7.1 MB/s

PC:$ ssh root@10.254.239.1 "dd if=/dev/zero bs=1M count=100 2>/dev/null" | dd of=/dev/null bs=1M
0+3205 records in
0+3205 records out
104857600 bytes (105 MB) copied, 16.5335 s, 6.3 MB/s

So still, the "download" speed from WAN to LAN is only ~50-58Mb/s. Now, for me, with just a 15Mb/s Internet connection, I can live with this. Others might not be able to.

And just for posterity, the LAN to WAN "upload" speed:

PC:$ dd if=/dev/zero bs=1M count=100 | ssh root@10.254.239.1 "dd of=/dev/null bs=1M 2>/dev/null"
100+0 records in
100+0 records out
104857600 bytes (105 MB) copied, 5.43086 s, 19.3 MB/s

154Mb/s.

And after all of this, the error counters:

root@new-gw:~# swconfig dev rtl8366s port 5 show | grep -e EtherStatsDropEvents -e Dot3StatsFCSErrors
EtherStatsDropEvents                : 0
Dot3StatsFCSErrors                  : 0

So yay on that front.

comment:17 Changed 7 years ago by Brian J. Murrell <brian@…>

I was going to install the original d-link firmware to do some baseline benchmarking of the LAN->WAN performance but that does not appear to be at all straightforward to do. Specifically from http://wiki.openwrt.org/toh/d-link/dir-825#firmware.recovery which refers back to http://wiki.openwrt.org/toh/d-link/dir-825#installation.using.firmware.recovery.mode:

1. Get into the D-Link recovery console with the steps below:
...
  3. Go to http://192.168.0.1 using your MS Internet Explorer (other browsers don't work and also a Windows running on a VM, like VMware, doesn't work!)

I don't have any Windows machines (and therefor any MS Internet Explorer) around here so I will just pass on trying to get the D-Link firmware back on there simply to do some baseline testing.

comment:18 Changed 7 years ago by yatakama

Maybe the low speeds you are seeing are due to ssh piping. I can fully saturate my 100 Mbps wan link in r25196 (downloading from wan to lan). I used to see speeds around 1 Mbps before the patch in r25121.

comment:19 Changed 7 years ago by juhosg

  • Milestone changed from Kamikaze Bugs Paradise to Backfire 10.03.1
  • Resolution set to fixed
  • Status changed from assigned to closed

Fixed in r25121 (trunk) and r25257 (backfire).

comment:20 Changed 7 years ago by eximido

  • Resolution fixed deleted
  • Status changed from closed to reopened

These errors are still present at B1 board.
Stats after a few days of uptime:

root@OpenWrt:~# swconfig dev rtl8366s port 5 show | grep -e EtherStatsDropEvents -e Dot3StatsFCSErrors
EtherStatsDropEvents                : 57160
Dot3StatsFCSErrors                  : 57160

I can't say that the speed I observe is dramatically low or that hundreds of millions packets got lost due to those errors, but I assume that at best there should be no errors at all, but they are in place and counters are continuously growing. Maybe there should be different init values for B1 and B2 switches, I don't know.

comment:21 Changed 7 years ago by eximido

Just compiled and flashed current trunk, it seems there's no more FCSErrors so far.
No idea why they were there before.

Please, close this ticket.

comment:22 Changed 7 years ago by juhosg

  • Resolution set to fixed
  • Status changed from reopened to closed

Add Comment

Modify Ticket

Action
as closed .
The resolution will be deleted. Next status will be 'reopened'.
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.