I have 2 networks that are configured just about identically. The both have the same Router - Mikrotik RB2011UiAS-RM, with a direct fiber link to the ISP. I am using the same ISP for both networks. My first network has been up and running with no significant issues for about 4 years now. The new network has been up for maybe 2 months. I have patterned the second network after the first so they are set up with the same VLANs, IP Schemes, etc. Everything seems to be working fine but the last couple weeks I've been getting complaints about certain websites failing to load.
The issues is consistent with websites that don't load and it seems to be random which websites have the issue. For example Hulu.com will load but logging into Hulu fails. The biggest problem is some of the companies Vendor websites are not loading. These are the ones I've focused on since they are the ones that need to work for the company.
Last week, I wiresharked the connection at the second network to see if I could see what was failing on a site that they were telling me doesn't load. I got the following:
2097 81.935154 10.0.100.193 45.60.196.32 TCP 66 50793 → 443 [SYN] Seq=0 Win=64240 Len=0 MSS=1460 WS=256 SACK_PERM
2098 81.936384 10.0.100.193 45.60.196.32 TCP 66 50794 → 443 [SYN] Seq=0 Win=64240 Len=0 MSS=1460 WS=256 SACK_PERM
2111 81.976423 45.60.196.32 10.0.100.193 TCP 66 443 → 50793 [SYN, ACK] Seq=0 Ack=1 Win=64240 Len=0 MSS=1460 SACK_PERM WS=128
2112 81.976513 45.60.196.32 10.0.100.193 TCP 66 443 → 50794 [SYN, ACK] Seq=0 Ack=1 Win=64240 Len=0 MSS=1460 SACK_PERM WS=128
2113 81.976549 10.0.100.193 45.60.196.32 TCP 54 50793 → 443 [ACK] Seq=1 Ack=1 Win=262656 Len=0
2114 81.976616 10.0.100.193 45.60.196.32 TCP 54 50794 → 443 [ACK] Seq=1 Ack=1 Win=262656 Len=0
2115 81.977504 10.0.100.193 45.60.196.32 TLSv1 571 Client Hello
2116 81.978230 10.0.100.193 45.60.196.32 TLSv1 571 Client Hello
2124 82.017575 45.60.196.32 10.0.100.193 TCP 60 443 → 50793 [ACK] Seq=1 Ack=518 Win=64128 Len=0
2125 82.017984 45.60.196.32 10.0.100.193 TCP 60 443 → 50794 [ACK] Seq=1 Ack=518 Win=64128 Len=0
2126 82.018045 45.60.196.32 10.0.100.193 SSL 1230 [TCP Previous segment not captured] , Continuation Data
2127 82.018081 10.0.100.193 45.60.196.32 TCP 66 [TCP Dup ACK 2113#1] 50793 → 443 [ACK] Seq=518 Ack=1 Win=262656 Len=0 SLE=2921 SRE=4097
2128 82.018447 45.60.196.32 10.0.100.193 SSL 1230 [TCP Previous segment not captured] , Continuation Data
2129 82.018491 10.0.100.193 45.60.196.32 TCP 66 [TCP Dup ACK 2114#1] 50794 → 443 [ACK] Seq=518 Ack=1 Win=262656 Len=0 SLE=2921 SRE=4097
2130 82.018816 45.60.196.32 10.0.100.193 SSL 236 [TCP Previous segment not captured] , Continuation Data
2131 82.018853 10.0.100.193 45.60.196.32 TCP 74 [TCP Dup ACK 2113#2] 50793 → 443 [ACK] Seq=518 Ack=1 Win=262656 Len=0 SLE=5557 SRE=5739 SLE=2921 SRE=4097
2132 82.019221 45.60.196.32 10.0.100.193 SSL 236 [TCP Previous segment not captured] , Continuation Data
2133 82.019246 10.0.100.193 45.60.196.32 TCP 74 [TCP Dup ACK 2114#2] 50794 → 443 [ACK] Seq=518 Ack=1 Win=262656 Len=0 SLE=5557 SRE=5739 SLE=2921 SRE=4097
2414 91.975313 45.60.196.32 10.0.100.193 TCP 60 443 → 50793 [FIN, ACK] Seq=5739 Ack=518 Win=64128 Len=0
2415 91.975378 10.0.100.193 45.60.196.32 TCP 74 [TCP Dup ACK 2113#3] 50793 → 443 [ACK] Seq=518 Ack=1 Win=262656 Len=0 SLE=5557 SRE=5739 SLE=2921 SRE=4097
2416 91.980004 45.60.196.32 10.0.100.193 TCP 60 443 → 50794 [FIN, ACK] Seq=5739 Ack=518 Win=64128 Len=0
2417 91.980052 10.0.100.193 45.60.196.32 TCP 74 [TCP Dup ACK 2114#3] 50794 → 443 [ACK] Seq=518 Ack=1 Win=262656 Len=0 SLE=5557 SRE=5739 SLE=2921 SRE=4097
3135 111.978393 10.0.100.193 45.60.196.32 TCP 54 50793 → 443 [FIN, ACK] Seq=518 Ack=1 Win=262656 Len=0
3136 111.978658 10.0.100.193 45.60.196.32 TCP 54 50794 → 443 [FIN, ACK] Seq=518 Ack=1 Win=262656 Len=0
3139 112.280923 10.0.100.193 45.60.196.32 TCP 54 [TCP Retransmission] 50794 → 443 [FIN, ACK] Seq=518 Ack=1 Win=262656 Len=0
3140 112.280923 10.0.100.193 45.60.196.32 TCP 54 [TCP Retransmission] 50793 → 443 [FIN, ACK] Seq=518 Ack=1 Win=262656 Len=0
3150 112.882128 10.0.100.193 45.60.196.32 TCP 54 [TCP Retransmission] 50793 → 443 [FIN, ACK] Seq=518 Ack=1 Win=262656 Len=0
3151 112.882127 10.0.100.193 45.60.196.32 TCP 54 [TCP Retransmission] 50794 → 443 [FIN, ACK] Seq=518 Ack=1 Win=262656 Len=0
3163 114.097284 10.0.100.193 45.60.196.32 TCP 54 [TCP Retransmission] 50794 → 443 [FIN, ACK] Seq=518 Ack=1 Win=262656 Len=0
3164 114.097284 10.0.100.193 45.60.196.32 TCP 54 [TCP Retransmission] 50793 → 443 [FIN, ACK] Seq=518 Ack=1 Win=262656 Len=0
3193 116.514004 10.0.100.193 45.60.196.32 TCP 54 [TCP Retransmission] 50794 → 443 [FIN, ACK] Seq=518 Ack=1 Win=262656 Len=0
3194 116.514004 10.0.100.193 45.60.196.32 TCP 54 [TCP Retransmission] 50793 → 443 [FIN, ACK] Seq=518 Ack=1 Win=262656 Len=0
3387 121.329207 10.0.100.193 45.60.196.32 TCP 54 [TCP Retransmission] 50794 → 443 [FIN, ACK] Seq=518 Ack=1 Win=262656 Len=0
3388 121.329207 10.0.100.193 45.60.196.32 TCP 54 [TCP Retransmission] 50793 → 443 [FIN, ACK] Seq=518 Ack=1 Win=262656 Len=0
3727 130.944445 10.0.100.193 45.60.196.32 TCP 54 50794 → 443 [RST, ACK] Seq=519 Ack=1 Win=0 Len=0
3728 130.944445 10.0.100.193 45.60.196.32 TCP 54 50793 → 443 [RST, ACK] Seq=519 Ack=1 Win=0 Len=0
So when I saw this I knew there was a problem with the server not responding to the TLS Client Hello sent from my machine. It wasn't until I did another capture on the first network that I saw what was going on:
141 8.485975 10.0.100.193 45.60.196.32 TCP 66 49533 → 443 [SYN] Seq=0 Win=64240 Len=0 MSS=1460 WS=256 SACK_PERM
143 8.495430 10.0.100.193 45.60.196.32 TCP 66 49534 → 443 [SYN] Seq=0 Win=64240 Len=0 MSS=1460 WS=256 SACK_PERM
160 8.529277 45.60.196.32 10.0.100.193 TCP 66 443 → 49533 [SYN, ACK] Seq=0 Ack=1 Win=64240 Len=0 MSS=1340 SACK_PERM WS=128
161 8.529397 10.0.100.193 45.60.196.32 TCP 54 49533 → 443 [ACK] Seq=1 Ack=1 Win=262400 Len=0
162 8.530000 10.0.100.193 45.60.196.32 TLSv1.3 571 Client Hello
163 8.538789 45.60.196.32 10.0.100.193 TCP 66 443 → 49534 [SYN, ACK] Seq=0 Ack=1 Win=64240 Len=0 MSS=1340 SACK_PERM WS=128
164 8.538878 10.0.100.193 45.60.196.32 TCP 54 49534 → 443 [ACK] Seq=1 Ack=1 Win=262400 Len=0
165 8.539542 10.0.100.193 45.60.196.32 TLSv1.3 571 Client Hello
180 8.572428 45.60.196.32 10.0.100.193 TCP 60 443 → 49533 [ACK] Seq=1 Ack=518 Win=64128 Len=0
181 8.575808 45.60.196.32 10.0.100.193 TLSv1.3 1394 Server Hello, Change Cipher Spec, Application Data
182 8.575965 45.60.196.32 10.0.100.193 TCP 1394 443 → 49533 [PSH, ACK] Seq=1341 Ack=518 Win=64128 Len=1340 [TCP segment of a reassembled PDU]
For some reason, on my new network, my computer is not using TLSv1.3 its using TLSv1 and I'm guessing the server isn't responding because it's not going to use the outdated protocol. (Which makes sense to me.) So I understand what is happening but what I can't figure out is why my computer is doing this.
Correct me if I'm wrong but my understanding is the TLS version is negotiated between Client and Server and is not dependent on the network used. I used the same laptop on both networks so it's not a matter of updates needed to the Client machine. Additionally, tracert shows that I have a link to the IP which isn't surprising because I'm definitely communicating with it but the TLS version is stopping the server from continuing to communicate.
I'm completely at a loss as to how to fix this or why I would even be seeing this problem. Definitely a first for me. Does anyone have some troubleshooting ideas or possibly ever had a similar issue?
Thanks in advance for all your help.
Update:
I went back to the new network to do some exploring. I'm even more confused now. I just ran a capture on just my IP and tried to do my normal work/browsing and found many sites that won't load. Amazon works fine. ServerFault and StackOverflow wont load. So I filtered my capture by TLS protocol and I am definitely seeing TLSv1.2 and TLSv1.3 successfully work on that network but it seems to be selective. In all of the cases where the website fails/times out my computer is trying to communicate over TLSv1. I just have no idea why it would try that when the website supports a higher protocol.
Update #2:
2 new things that happened yesterday:
- I checked the system time on all of my switches and my router. The router had the correct time but my switches were still at default time of somewhere in the year 2000. So I set the time for all of my network equipment but this didn't make a difference in my issue.
- I ran traceroutes on both locations and got vastly different results. The new network (which fails to connect) had 15 hops while the other network had 9 hops. I have the same ISP at both locations and the first 2 hops after leaving the LAN were exactly the same and then things seemed to spiral out for the new network. I've sent these to my ISP and I'm waiting to hear back from them.
At this point, I'm thinking its not something wrong with my local network but there are issues down the line.
Update #3:
My ISP sent out a tech with a media converter and connected their laptop directly to the Fiber network and all of the sites worked just perfectly. So something in my router is causing the communication problem. Additionally I've even downgraded my "non-working" router to the same version as the network that doesn't have any issues. I'm still having the same problem. It might be worth noting that my ISP has set up a static route for one of the websites that we are having issues with and that website is not having an issue. So I would think that means the routing of the packets could play a part as well. I've identified some ppp settings that I can try to work with but I'm not hopeful that they will actually be the problem. In the meantime I've reached out to Mikrotik to see what any insights they might have.