I've been trying to read up on the details of drops as they are reported by various tools (and at various levels) of the OS. So far, most of the info I've been able to dig up by googling seems rather "hand-wavy" to me.
First, let me state that the example host I'm looking at shows ZERO drops in /proc/net/softnet_stat
. This indicates to me that NIC ring buffers are probably sized adequately. Now, onto ethtool
...
This is what the NIC multi-queue looks like:
# ethtool -l em1
Channel parameters for em1:
Pre-set maximums:
RX: 16
TX: 16
Other: n/a
Combined: n/a
Current hardware settings:
RX: 16
TX: 16
Other: n/a
Combined: n/a
Now, here is what the rx drops look like for that same interface:
# ethtool -S em1 | grep rx.*dropped:
rx_dropped: 1742
rx0_dropped: 0
rx1_dropped: 0
rx2_dropped: 0
rx3_dropped: 0
rx4_dropped: 0
rx5_dropped: 0
rx6_dropped: 0
rx7_dropped: 0
rx8_dropped: 0
rx9_dropped: 0
rx10_dropped: 0
rx11_dropped: 0
rx12_dropped: 0
rx13_dropped: 0
rx14_dropped: 0
rx15_dropped: 0
My assumption here is that the 16 individual queues here relate to the NIC ring buffer multi-queue. All zeros here seems to agree with what I'm seeing in softnet_stat
. Further, I'm assuming that any drops counted in softnet_stat
would be reflected in this ethtool
output, if they were happening (which they currently are not).
That leaves the sort of vague 'rx_dropped'
field, which is actually incrementing. So, my assumption about this is that it is NOT related to the NIC ring buffer, but is a higher-protocol drop counter. This count is in fact reflected in the ip -s
stats for the interface:
# ip -s link show dev em1
2: em1: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 9000 qdisc mq master bond0 state UP mode DEFAULT group default qlen 1000
link/ether 94:18:82:70:2e:42 brd ff:ff:ff:ff:ff:ff
RX: bytes packets errors dropped missed mcast
219512805660516 147616023841 0 1742 0 5624266
TX: bytes packets errors dropped carrier collsns
649765242476657 450168813646 0 0 0 0
I believe these drops could be the result of any number of protocol-related issues, such as malformed packets, bad ports, congested app buffers, etc, etc.
Does this seem like a reasonable analysis that explains the "different" drop stats reported by ethtool -S
?