We have a 1Gbps (up/down) Internet link at work and I noticed performance was terrible when communicating with our servers in the Cloud. By terrible I mean, out of the theorical 1000Mbps, we were getting 30Mbps top. After long diagnostic sessions we came to realize that we have a network performance problem when receiving data from our Windows Server machines. Strangely, we get pretty decent speeds when the server is a Linux VM (I had to make a few adjustments to allow a bigger TCP congestion window – exact changes listed below).
During the investigation process I’ve removed more and more elements to reduce the problem to its most simple form. My test laptop is now connected directly to the modem, without any software that could interfere with network performance. Same for my test VM in the cloud (top 3 provider). I used iperf to benchmark the transfer speed. Since I removed the firewall out of the equation the result is very different and we can get in the hundreds of Mbps, but I still can’t make sense of the following:
Linux VM in cloud sending data to laptop : I got a 640Mbps average out of 60 seconds the last time I tested (lowest second in this test was 530Mbps)
Windows VM in the cloud sending data to laptop : sometimes it starts at 400Mbps and can even reach 500Mbps for short periods of time, but about 90% of the time it starts at around 60Mbps on the 1st second, then drops to 30Mbps, then slowly climbs. If I let the test go for 60 seconds it’ll reach speeds around 400Mbps after 20+ seconds but transfer speed is much less stable than Linux’s (it also seems to be slower overall but it’s hard to evaluate now).
Additional information
- Laptop (the client in the data transfer) is a Windows PC
- When the laptop acts as the server (uploading) speed is good whatever the configuration is (destination can be Linux, Windows, the business firewall in place doesn’t change anything either)
- I’ve played with the Windows VM’s settings (adapter and TCP) in all the way imaginable ways without any impact on the problem. The only gain connecting the laptop directly to the modem at work (I would then go from about 10Mbps to 30-400Mbps depending the kind of test).
- I’ve used iperf3 to make all the benchmarks (some results are shown below)
- When creating the Windows Server VM, I used the default OS offered by the cloud provider, so there’s no exotic configuration there.
- Ping between our office and our servers is around 28 ms.
- While I was connected to the firewall, we made a test with a coworker who has a business connection with the same internet provider (same speed as we have at the office). The ping from my laptop to the Windows VM at his place was somewhere between 2 and 3ms. The throughput (his VM acting as the server) was around 120 Mbps. The same test with the VM in the cloud (28ms ping) was 10 Mbps.
- I had to set a large window parameter when using iperf from Windows OS to allow maximum bandwidth (-w 5M), but it wasn't necessary from Linux.
My impression is that there’s a problem with Windows Server congestion window scaling. The Wireshark capture shows a big unused TCP window, but I don’t know how to fix the problem or investigate further at this point.
I’ve included some tests I’ve made below, but I can provide more if needed.
## IPERF (Linux VM in the cloud sending data to laptop at the office)
superuser@testnr5linux:~$ iperf3 -c xx.xx.xx.xx -w 5M -t 60
Connecting to host xx.xx.xx.xx, port 5201
[ 5] local 10.4.0.4 port 47268 connected to xx.xx.xx.xx port 5201
[ ID] Interval Transfer Bitrate Retr Cwnd
[ 5] 0.00-1.00 sec 76.2 MBytes 640 Mbits/sec 343 2.54 MBytes
[ 5] 1.00-2.00 sec 80.0 MBytes 671 Mbits/sec 53 1.88 MBytes
[ 5] 2.00-3.00 sec 73.8 MBytes 619 Mbits/sec 0 1.98 MBytes
[ 5] 3.00-4.00 sec 77.5 MBytes 650 Mbits/sec 0 2.06 MBytes
[ 5] 4.00-5.00 sec 80.0 MBytes 671 Mbits/sec 0 2.11 MBytes
(55 more seconds avg 640Mbps)
## Example of good speed from cloud VM (Windows) followed by long slow start
C:\Users\superuser>iperf3.exe -c xx.xx.xx.xx -w 5M
Connecting to host xx.xx.xx.xx, port 5201
[ 4] local 10.3.0.4 port 51988 connected to xx.xx.xx.xx port 5201
[ ID] Interval Transfer Bandwidth
[ 4] 0.00-1.00 sec 17.4 MBytes 145 Mbits/sec
[ 4] 1.00-2.00 sec 28.9 MBytes 243 Mbits/sec
[ 4] 2.00-3.00 sec 53.1 MBytes 446 Mbits/sec
[ 4] 3.00-4.00 sec 54.0 MBytes 453 Mbits/sec
[ 4] 4.00-5.00 sec 54.0 MBytes 452 Mbits/sec
[ 4] 5.00-6.00 sec 54.9 MBytes 461 Mbits/sec
[ 4] 6.00-7.00 sec 55.5 MBytes 465 Mbits/sec
[ 4] 7.00-8.00 sec 56.2 MBytes 471 Mbits/sec
[ 4] 8.00-9.00 sec 58.2 MBytes 490 Mbits/sec
[ 4] 9.00-10.00 sec 60.5 MBytes 508 Mbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bandwidth
[ 4] 0.00-10.00 sec 493 MBytes 413 Mbits/sec sender
[ 4] 0.00-10.00 sec 490 MBytes 411 Mbits/sec receiver
iperf Done.
C:\Users\superuser>iperf3.exe -c xx.xx.xx.xx -w 5M
Connecting to host xx.xx.xx.xx, port 5201
[ 4] local 10.3.0.4 port 51996 connected to xx.xx.xx.xx port 5201
[ ID] Interval Transfer Bandwidth
[ 4] 0.00-1.01 sec 7.12 MBytes 59.0 Mbits/sec
[ 4] 1.01-2.01 sec 3.25 MBytes 27.4 Mbits/sec
[ 4] 2.01-3.00 sec 4.00 MBytes 33.8 Mbits/sec
[ 4] 3.00-4.00 sec 5.00 MBytes 41.9 Mbits/sec
[ 4] 4.00-5.01 sec 5.88 MBytes 49.0 Mbits/sec
[ 4] 5.01-6.00 sec 6.62 MBytes 56.0 Mbits/sec
[ 4] 6.00-7.00 sec 7.50 MBytes 62.9 Mbits/sec
[ 4] 7.00-8.01 sec 8.62 MBytes 71.8 Mbits/sec
[ 4] 8.01-9.00 sec 9.38 MBytes 79.5 Mbits/sec
[ 4] 9.00-10.01 sec 10.4 MBytes 86.3 Mbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bandwidth
[ 4] 0.00-10.01 sec 67.8 MBytes 56.8 Mbits/sec sender
[ 4] 0.00-10.01 sec 63.1 MBytes 52.9 Mbits/sec receiver
## Changes I had to make to /etc/sysctl.conf to increase CWND.
# allow testing with buffers up to 64MB
net.core.rmem_max = 67108864
net.core.wmem_max = 67108864
# increase Linux autotuning TCP buffer limit to 32MB
net.ipv4.tcp_rmem = 4096 87380 33554432
net.ipv4.tcp_wmem = 4096 65536 33554432
Note - It looks like I need at least 10 reputation to post images, but here are the links.. https://i.stack.imgur.com/IXhnS.png
https://i.stack.imgur.com/UyPYL.png