I have a simplest/common network architecture.
Web server sits behind router on local network. This router does iptables DNAT so port forwarding is achieved to web server.
Therefore, I'm able to download file from server 1 to my computer over the internet.
My questions
What is the proper kernel tuning to ensure that router is using most of its potential (for around 2000 connections and highest throughput)? I have an issue in ORANGE
Do kernel parameters look fine on Server 1?
Can you explain why I've got just 3mbps from Server 1 while CPU and RAM are not overloaded? So can you see other issues apart Linux kernel, CPU and RAM? Could you list these possible issues to explore? 1gbps network interfaces, ports, etc? 2x1.5ghz ARM is slow for routing? iptables version?
OS and resources
Computer - Mac OS 8 x86 CPU cores, 16G/32G of free RAM
Router - Linux DD-WRT 2 ARM CPU cores, 270M/512M of free RAM
Server 1 - Linux Ubuntu 18.04 4 x86 CPU cores, 240M/32G of free RAM (500M swapped to SSD)
Server 2 - Linux Raspbian 1 ARM CPU core, 95M/512M of free RAM
MTU
Everywhere 1500
TXQUEUELEN
Everywhere 1000
Protocols
UDP speeds are fine
TCP speed is affected, any port
Iptables version
Router - 1.3.7
Server 1 - 1.8.4
Server 2 - 1.6.0
Linux versions
Router - 4.9.207
Server 1 - 5.4.0-67-generic
Server 2 - 4.14.79+
Theoretical link speeds
From my computer to router - 30mbps / 3.75 MB/s
From router to web server 1 - 1gbps
From router to web server 2 - 1gbps
Download speeds from web server (file is hosted in RAM)
TEST 1: Server 2 -> Router = 800mbps
TEST 2: Server 2 -> Computer = 30mbps
TEST 3: Server 1 -> Router = 800mbps
TEST 4: Server 1 -> Computer using 15 connections = 15mbps
TEST 5: Server 1 -> Computer = 3mbps (the issue!)
CPU usage is at around few percents on any device. CPU load average is 0.0x on all devices, but Server 1 - it has 4.6 load average. Server 1 also handles around 500-1000 connections for other things outside of tests, but at around 1mbps so it shouldn't affect test throughput dramatically (unless these connections somehow making things worse indirectly).
Regardless that load is higher, TEST 3 performed very well. So it's still hard to blame Server 1.
There are no issues in dmesg
on any device.
My thoughts
Issue appears only when DNAT'ing on router and only with Server 1 which has high amount of other connections (but these connections are almost idling so shouldn't affect everything badly?).
Most interesting test to describe in final thoughts
When I do multi-thread web download (TEST 4) Server 1 performs much better.
So it's capable to reach higher download speeds. But why 1 connection can't reach same speed as multiple ones?
Parameters that I explored
Can you see something that is not well optimised for Linux router?
net.core.wmem_max
- maximum tcp socket send buffer memory size (in bytes). Increase TCP read/write buffers to enable scaling to a larger window size. Larger windows increase the amount of data to be transferred before an acknowledgement (ACK) is required. This reduces overall latencies and results in increased throughput.
This setting is typically set to a very conservative value of 262,144 bytes. It is recommended this value be set as large as the kernel allows. The value used in here was 4,136,960 bytes. However, 4.x kernels accept values over 16MB.
Router - 180224
Server 1 - 212992
Server 2 - 163840
Somewhere else used - 83886080
net.core.wmem_default
Router - 180224
Server 1 - 212992
Server 2 - 163840
Somewhere else used - 83886080
net.ipv4.rmem_max
- maximum tcp socket receive buffer memory size (in bytes)
Router - 180224
Server 1 - 212992
Server 2 - 163840
Somewhere else used - 335544320
net.core.rmem_default
Router - 180224
Server 1 - 212992
Server 2 - 163840
Somewhere else used - 335544320
net.ipv4.tcp_rmem
- Contains three values that represent the minimum, default and maximum size of the TCP socket receive buffer. The recommendation is to use the maximum value of 16M bytes or higher (kernel level dependent) especially for 10 Gigabit adapters.
Router - 4096 87380 3776288
Server 1 - 4096 131072 6291456
Server 2 - 4096 87380 3515840
Somewhere else used - 4096 87380 4136960 (IBM)
net.ipv4.tcp_wmem
- Similar to the net.ipv4.tcp_rmem this parameter consists of 3 values, a minimum, default, and maximum. The recommendation is to use the maximum value of 16M bytes or higher (kernel level dependent) especially for 10 Gigabit adapters.
Router - 4096 16384 3776288
Server 1 - 4096 16384 4194304
Server 2 - 4096 16384 3515840
Somewhere else used - 4096 87380 4136960 (IBM)
net.ipv4.tcp_tw_reuse
- In high traffic environments, sockets are created and destroyed at very high rates. This parameter, when set, allows no longer needed and about to be destroyed sockets to be used for new connections. When enabled, this parameter can bypass the allocation and initialization overhead normally associated with socket creation saving CPU cycles, system load and time.
The default value is 0 (off). The recommended value is 1 (on).
Router - 0
Server 1 - 2
Server 2 - 0
Somewhere else used - 1
net.ipv4.tcp_tw_reuse
Router - 0
Server 1 - 2
Server 2 - 0
Somewhere else used - 1
net.ipv4.tcp_max_tw_buckets
- Specifies the maximum number of sockets in the “time-wait” state allowed to exist at any time. If the maximum value is exceeded, sockets in the “time-wait” state are immediately destroyed and a warning is displayed. This setting exists to thwart certain types of Denial of Service attacks. Care should be exercised before lowering this value. When changed, its value should be increased, especially when more memory has been added to the system or when the network demands are high and environment is less exposed to external threats.
Router - 2048
Server 1 - 131072
Server 2 - 2048
Somewhere else used - 65536, 262144 (IBM), 45000 (IBM)
net.ipv4.tcp_tw_reuse
Router - 0
Server 1 - 2
Server 2 - 0
Somewhere else used - 1
net.ipv4.tcp_fin_timeout
Router - 60
Server 1 - 60
Server 2 - 60
Somewhere else used - 15
net.ipv4.tcp_max_syn_backlog
Router - 128
Server 1 - 2048
Server 2 - 128
Somewhere else used - 65536
net.ipv4.ip_local_port_range
- range of ports used for outgoing TCP connections (useful to change it if you have a lot of outgoing connections from host)
Router - 32768 60999
Server 1 - 32768 60999
Server 2 - 32768 60999
Somewhere else used - 1024 65535
net.core.netdev_max_backlog
- number of slots in the receiver's ring buffer for arriving packets (kernel put packets in this queue if the CPU is not available to process them, for example by application)
Router - 120
Server 1 - 1000
Server 2 - 1000
Somewhere else used - 100000, 1000 (IBM), 25000 (IBM)
net.ipv4.neigh.default.gc_thresh1
Router - 1
Server 1 - 128
Server 2 - 128
Somewhere else used - 128
net.ipv4.neigh.default.gc_thresh2
Router - 512
Server 1 - 512
Server 2 - 512
Somewhere else used - 512
net.ipv4.neigh.default.gc_thresh3
Router - 1024
Server 1 - 1024
Server 2 - 1024
Somewhere else used - 1024
net.ipv4.neigh.default.gc_thresh3
Router - 1024
Server 1 - 1024
Server 2 - 1024
Somewhere else used - 1024
net.core.somaxconn
- maximum listen queue size for sockets (useful and often overlooked setting for loadbalancers, webservers and application servers (like unicorn, php-fpm). If all server processes/threads are busy, then incoming client connections are put in “backlog” waiting for being served). Full backlog causes client connections to be immediately rejected, causing client error.
Router - 128
Server 1 - 4096
Server 2 - 128
net.ipv4.tcp_mem
- TCP buffer memory usage thresholds for autotuning, in memory pages (1 page = 4kb)
Router - 5529 7375 11058
Server 1 - 381144 508193 762288
Server 2 - 5148 6866 10296
net.nf_conntrack_max
- maximum number of connections
Router - 32768
Server 1 - 262144
Server 2 - no information
net.netfilter.nf_conntrack_max
- maximum number of connections? If this is correct parameter, then 1560 is not enough
Router - 1560
Server 1 - 262144
Server 2 - no information
/proc/sys/net/ipv4/tcp_congestion_control
- Network congestion in data networking [...] is the reduced quality of service that occurs when a network node is carrying more data than it can handle. Typical effects include queueing delay, packet loss or the blocking of new connections. Networks use congestion control and congestion avoidance techniques to try to avoid congestion collapse.1
Router - westwood
Server 1 - cubic
Server 2 - cubic
net.ipv4.tcp_syn_retries
- Specifies how many times to try to retransmit the initial SYN packet for an active TCP connection attempt. The current setting is 20, which means that there are 20 retransmission attempts before the connection times out. This can take several minutes, depending on the length of the retransmission attempt.
Router - 6
Server 1 - 6
Server 2 - 6
net.ipv4.tcp_low_latency
- The default value is 0 (off). For workloads or environments where latency is a higher priority, the recommended value is 1 (on).
Router - 0
Server 1 - 0
Server 2 - 0
net.ipv4.tcp_limit_output_bytes
- Using this parameter, TCP controls small queue limits on per TCP socket basis. TCP tends to increase the data in-flight until loss notifications are received. With aspects of TCP send auto-tuning, large amounts of data might get queued at the device on the local machine, which can adversely impact the latency for other streams. tcp_limit_output_bytes limits the number of bytes on a device to reduce the latency effects caused by a larger queue size.
Router - 262144
Server 1 - 1048576
Server 2 - 262144
Somewhere else used - 262,144 (IBM), 131,072 (IBM)