I have a Linux machine running Debian 11 that is acting as a router between a device sitting at the edge and a host machine. While doing a curl
command to download a file the interface that is facing the edge will start to accumulate rx_crc_errors
sporadically but consistently - the count goes up in spurts averaging maybe one or two per second. The result is that from both the router the host machine behind it this curl
command runs very slow and downloads at a fraction of the speed available on the line (and I've checked on other routers and servers which go through the same edge device and they run at the faster speed and do not accumulate rx_crc_errors).
The edge router (the gateway for the machine/interface which is accumulating these these rx_crc_errors) is a NetGate 1537 running pfSense 22.01 (latest).
Things we've tried so far:
- Replaced the cable
- Replaced the SFP adapter
- Used a different switch port
- Replaced the entire host machine with one of the same configuration
None of these changed the behavior, which, as far as I can tell, eliminates hardware as a source of the problem.
Doing curl https://dl.google.com/go/go1.18.1.linux-amd64.tar.gz --output t.dat
from the edge machine completes in about 1sec. From the router with rx_crc_errors it takes 17 seconds, and 21 seconds from the host behind it.
The errors show up on the uplink interface like so:
...
2: enp3s0f0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
link/ether 10:1f:74:35:fc:94 brd ff:ff:ff:ff:ff:ff
RX: bytes packets errors dropped missed mcast
1609100899 1250023 508 0 0 2534
TX: bytes packets errors dropped carrier collsns
20574398 206727 0 0 0 0
...
And then with ethtool I can get the additional detail showing the CRC error (rx_crc_errors: 508
)
ethtool -S enp3s0f0
NIC statistics:
rx_bytes: 1609111223
rx_error_bytes: 0
tx_bytes: 20588905
tx_error_bytes: 0
rx_ucast_packets: 1245869
rx_mcast_packets: 2554
rx_bcast_packets: 1687
tx_ucast_packets: 202770
tx_mcast_packets: 4038
tx_bcast_packets: 0
tx_mac_errors: 0
tx_carrier_errors: 0
rx_crc_errors: 508
rx_align_errors: 0
tx_single_collisions: 0
tx_multi_collisions: 0
tx_deferred: 0
tx_excess_collisions: 0
tx_late_collisions: 0
tx_total_collisions: 0
rx_fragments: 22
rx_jabbers: 0
rx_undersize_packets: 0
rx_oversize_packets: 0
rx_64_byte_packets: 1823
rx_65_to_127_byte_packets: 9084
rx_128_to_255_byte_packets: 2371
rx_256_to_511_byte_packets: 585
rx_512_to_1023_byte_packets: 80
rx_1024_to_1522_byte_packets: 1236167
rx_1523_to_9022_byte_packets: 0
tx_64_byte_packets: 0
tx_65_to_127_byte_packets: 200168
tx_128_to_255_byte_packets: 5659
tx_256_to_511_byte_packets: 370
tx_512_to_1023_byte_packets: 230
tx_1024_to_1522_byte_packets: 381
tx_1523_to_9022_byte_packets: 0
rx_xon_frames: 0
rx_xoff_frames: 0
tx_xon_frames: 0
tx_xoff_frames: 0
rx_mac_ctrl_frames: 0
rx_filtered_packets: 33361
rx_ftq_discards: 0
rx_discards: 0
rx_fw_discards: 0
Note that errors for all other interfaces on any other machines I've checked are all zero.
I'm stumped at this point as to what to check next. I suspect the issue is related to the edge router in some way, but there are other routers connected to this edge router which do not exhibit the same issue. At one point I was using VLAN trunking (802.1q tagging) from the edge router to the switch, but I disabled this and also tried raising the MTU by 4 bytes to 1504, neither of which made any visible difference - the rx_crc_errors still accumulate and performance is poor as above.
Any other ideas about diagnose the cause of these rx_crc_errors?