I've been having this really weird issue. I'm running WG on a VPS and on my macbook.
I run WG on the linuxserver container on a debian host.
The connection is great, the speed is good, everything works really well.
I've noticed though that every once in a while (like every 10-20min) there will be a handshake and my connection to the internet instantly drops. I can still access internal services so I know my macbook is still connected to the server. I can't access the internet for 16seconds until the next handshake when the internet instantly comes back.
I've monitored the server during this and I can see that while downloading a torrent there's a lot of kworker/1:1-wg-crypt-wg0
processes running and when this happens all of these processes are killed. It's almost as if the server is being restarted or WG is being restarted, but I know that's not the case because I can still access internal containers so the connection is there and WG didn't go down.
I know for a fact that I still have internet on other devices so it has something to do with wireguard.
This happens regardless of whether I'm doing intense networking like downloading a torrent or just browsing the web.
It looks like it's not the VPS losing connection. I've been pinging google and when it happens the VPS keeps pinging google and my macbook can't ping google
I'm looking for help on what it could be and what I should look into and how I can approach this... could it be something related to my router? Could it be a config issue?
Here are my configs:
Server
[Interface]
# Core settings
PrivateKey = xxxxx
Address = 10.6.0.0/24
# Misc. settings (optional)
ListenPort = 51820
# Interface hooks (optional)
PostUp = iptables -A FORWARD -i %i -j ACCEPT; iptables -A FORWARD -o %i -j ACCEPT; iptables -t nat -A POSTROUTING -o eth+ -j MASQUERADE
PostDown = iptables -D FORWARD -i %i -j ACCEPT; iptables -D FORWARD -o %i -j ACCEPT; iptables -t nat -D POSTROUTING -o eth+ -j MASQUERADE
MTU = 1400
#
# Peers
#
[Peer]
PublicKey = xxxxx
PresharedKey = xxxx
AllowedIPs = 10.6.0.2/32
PersistentKeepalive = 16
MACBOOK
[Interface]
# Core settings
PrivateKey = xxxxx
Address = 10.6.0.2/32
# Misc. settings (optional)
DNS = xxxxx
MTU = 1400
[Peer]
PublicKey = xxxx
Endpoint = xxxx:51820
AllowedIPs = 10.6.0.1/32, 0.0.0.0/0
PresharedKey = xxxx
PersistentKeepalive = 16
UPDATE:
These are the logs I get on the server side with modprobe wireguard
.
Mar 16 13:55:54 [ +2.156106] wireguard: wg0: Receiving handshake initiation from peer 315 (<my-client-ip>:16235)
Mar 16 13:55:54 [ +0.000003] wireguard: wg0: Sending handshake response to peer 315 (<my-client-ip>)
Mar 16 13:55:54 [ +0.000165] wireguard: wg0: Keypair 10604 destroyed for peer 315
Mar 16 13:55:54 [ +0.000002] wireguard: wg0: Keypair 10606 created for peer 315
Mar 16 13:56:10 [ +2.451426] wireguard: wg0: Receiving handshake initiation from peer 315 (<my-client-ip>)
Mar 16 13:56:10 [ +0.000003] wireguard: wg0: Sending handshake response to peer 315 (<my-client-ip>)
Mar 16 13:56:10 [ +0.000185] wireguard: wg0: Keypair 10605 destroyed for peer 315
Mar 16 13:56:10 [ +0.000001] wireguard: wg0: Keypair 10607 created for peer 315
Mar 16 13:56:10 [ +0.161195] wireguard: wg0: Receiving keepalive packet from peer 315 (<my-client-ip>)
And from the client side, pinging google.com to check internet connection
Mar 16 10:55:54 64 bytes from <google-ip>: icmp_seq=187 ttl=108 time=161.723 ms
Mar 16 10:55:56 Request timeout for icmp_seq 188
Mar 16 10:55:57 Request timeout for icmp_seq 189
Mar 16 10:55:58 Request timeout for icmp_seq 190
Mar 16 10:55:59 Request timeout for icmp_seq 191
Mar 16 10:56:00 Request timeout for icmp_seq 192
Mar 16 10:56:01 Request timeout for icmp_seq 193
Mar 16 10:56:02 Request timeout for icmp_seq 194
Mar 16 10:56:03 Request timeout for icmp_seq 195
Mar 16 10:56:04 Request timeout for icmp_seq 196
Mar 16 10:56:05 Request timeout for icmp_seq 197
Mar 16 10:56:06 Request timeout for icmp_seq 198
Mar 16 10:56:07 Request timeout for icmp_seq 199
Mar 16 10:56:08 Request timeout for icmp_seq 200
Mar 16 10:56:09 Request timeout for icmp_seq 201
Mar 16 10:56:10 Request timeout for icmp_seq 202
Mar 16 10:56:11 Request timeout for icmp_seq 203
Mar 16 10:56:11 64 bytes from <google-ip>: icmp_seq=204 ttl=108 time=161.172 ms
so there are two successful handshakes, but in between them I can't access the web
UPDATE 2:
Managed to get some level of logging on the client side (macbook). By running
sudo LOG_LEVEL=verbose wg show
I'm getting logs on when the macbook is receiving handshake responses and initiating handshakes.
In this new example the server
logs:
~INTERNET GOES DOWN HERE~
Mar 16 16:47:09 [ +0.393175] wireguard: wg0: Receiving handshake initiation from peer 315 (<client-ip>)
Mar 16 16:47:09 [ +0.000003] wireguard: wg0: Sending handshake response to peer 315 (<client-ip>)
Mar 16 16:47:09 [ +0.000175] wireguard: wg0: Keypair 10790 destroyed for peer 315
Mar 16 16:47:09 [ +0.000001] wireguard: wg0: Keypair 10793 created for peer 315
Mar 16 16:47:09 [ +0.280476] wireguard: wg0: Receiving keepalive packet from peer 315 (<client-ip>)
~INTERNET GOES BACK UP RIGHT AFTER THE NEXT LINES~
Mar 16 16:47:25 [ +1.391045] wireguard: wg0: Receiving handshake initiation from peer 315 (<client-ip>)
Mar 16 16:47:25 [ +0.000003] wireguard: wg0: Sending handshake response to peer 315 (<client-ip>)
Mar 16 16:47:25 [ +0.000166] wireguard: wg0: Keypair 10792 destroyed for peer 315
Mar 16 16:47:25 [ +0.000002] wireguard: wg0: Keypair 10794 created for peer 315
Mar 16 16:47:25 [ +0.159758] wireguard: wg0: Receiving keepalive packet from peer 315 (<client-ip>)
On the client side I see
DEBUG: (utun6) 2023/03/16 13:42:35 peer(xxxx) - Received handshake response
DEBUG: (utun6) 2023/03/16 13:42:35 peer(xxxx) - Sending keepalive packet
DEBUG: (utun6) 2023/03/16 13:44:35 peer(xxxx) - Sending handshake initiation
DEBUG: (utun6) 2023/03/16 13:44:35 peer(xxxx) - Received handshake response
DEBUG: (utun6) 2023/03/16 13:44:35 peer(xxxx) - Sending keepalive packet
DEBUG: (utun6) 2023/03/16 13:46:35 peer(xxxx) - Sending handshake initiation
DEBUG: (utun6) 2023/03/16 13:46:35 peer(xxxx) - Received handshake response
DEBUG: (utun6) 2023/03/16 13:46:35 peer(xxxx) - Sending keepalive packet
DEBUG: (utun6) 2023/03/16 13:47:24 peer(xxxx) - Retrying handshake because we stopped hearing back after 15 seconds
DEBUG: (utun6) 2023/03/16 13:47:24 peer(xxxx) - Sending handshake initiation
DEBUG: (utun6) 2023/03/16 13:47:24 peer(xxxx) - Received handshake response
DEBUG: (utun6) 2023/03/16 13:47:24 peer(xxxx) - Sending keepalive packet
I find the Retrying handshake because we stopped hearing back after 15 seconds
interesting cause that's when the internet goes down. So handshake fails and client retries and then it works? But why?
UPDATE:
I see that there's something going on with the faulty handshake port.
I have a watch wg show all
in the server and I can see that my macbook peer config is <my-ip>:16918
. Whenever a handshake fails and I lose connection I see a log like this on the server side:
Mar 16 18:17:19 [ +1.217668] wireguard: wg0: Receiving handshake initiation from peer 318 (<client-ip>:16235)
Mar 16 18:17:19 [ +0.000004] wireguard: wg0: Sending handshake response to peer 318 (<client-ip>:16235)
Mar 16 18:17:19 [ +0.000191] wireguard: wg0: Keypair 10893 destroyed for peer 318
Mar 16 18:17:19 [ +0.000002] wireguard: wg0: Keypair 10896 created for peer 318
Mar 16 18:17:19 [ +0.275182] wireguard: wg0: Receiving keepalive packet from peer 318 (<client-ip>:16235)
Mar 16 18:17:34 [ +0.687921] wireguard: wg0: Receiving handshake initiation from peer 318 (<client-ip>:16918)
Mar 16 18:17:34 [ +0.000004] wireguard: wg0: Sending handshake response to peer 318 (<client-ip>:16918)
Mar 16 18:17:34 [ +0.000250] wireguard: wg0: Keypair 10894 destroyed for peer 318
Mar 16 18:17:34 [ +0.000002] wireguard: wg0: Keypair 10897 created for peer 318
Mar 16 18:17:34 [ +0.164268] wireguard: wg0: Receiving keepalive packet from peer 318 (<client-ip>:16918)
For some reason the handshake is coming from a different port and so the server responds to the same port, but the client isn't listening on that port