I want to establish an always-on IPSec VPN between a DrayTek Vigor2860 and an EdgeRouter X (which uses StrongSwan). The Draytek is behind a NAT and dials into the ER-X. The VPN connects and works, but disconnects at the second rekeying. It then reconnects a few seconds later, but these disconnects are annoying.
The VPN is configured as IPSec in tunneling mode with IKEv2 key exchange. It uses ESP with AES128 with SHA1 and DH Group 14 with perfect forward secrecy enabled. Authentication is done with a PSK. The Draytek connects to Strongswan. Strongswan is set to rekey=no, therefore only the Draytek initialized rekeyings. (See below for the detailed config)
I also tried IKEv1, but it had the same problems.
rekey=yes
has the same problems.
What have I tried?
In the initial setup, the connection would be lost at the first rekeying. This is probably due to the fact that the rekeymargin on the draytek seems to be 300s. Therefore strongswan would be the first to try to rekey, which would fail. Setting charon.make_before_break = yes for strongswan seemed to mitigate this.
To make debugging easier, I manually added rekey=no to the strongswan config. Therefore the Draytek is now the only one initializing a rekeying.
Then the following happens.
connection is initialized.
shortly before the lifetime expires, the draytek initializes a rekeying which succeeds (!)
strongswan now has two CHILD_SAs for a few seconds. The older one gets deleted. The connection works the whole time (I had a ping running)
After another lifetime the connection disconnects while rekeying.
Looking at the tunnels in step 3, there seems to be a slight mismatch in settings. Pay attention to the MODP_2048 at the end of the second tunnel. I suspect that I need to change my ESP settings for strongswan slightly, but how? MODP_2048 corresponds to DH Group 14, as set on the Draytek.
# swanctl -l
peer-remote.example.com-tunnel-1: #2, ESTABLISHED, IKEv2, 2fa27291636d715a_i e14d5bf01207b22c_r*
local 'xxx.xxx.xxx.xxx' @ xxx.xxx.xxx.xxx[4500]
remote 'remote.example.com' @ rem.ote.ip.addr[61001]
AES_CBC-128/HMAC_SHA1_96/PRF_HMAC_SHA1/MODP_2048
established 478s ago
peer-remote.example.com-tunnel-1: #2, reqid 1, INSTALLED, TUNNEL-in-UDP, ESP:AES_CBC-128/HMAC_SHA1_96
installed 478s ago
in ca261c8b, 104209 bytes, 507 packets, 0s ago
out ceeadfb4, 114045 bytes, 624 packets, 0s ago
local 192.168.70.0/24
remote 192.168.71.0/24
peer-remote.example.com-tunnel-1: #3, reqid 1, INSTALLED, TUNNEL-in-UDP, ESP:AES_CBC-128/HMAC_SHA1_96/MODP_2048
installed 26s ago
in c6b99851, 5799 bytes, 27 packets, 0s ago
out ceeadfb5, 6334 bytes, 32 packets, 0s ago
local 192.168.70.0/24
remote 192.168.71.0/24
I also found this hint in the strongswan wiki which seems to point in the same direction:
https://wiki.strongswan.org/projects/strongswan/wiki/connsection
In the esp = <cipher suites>
section:
If dh-group is specified, CHILD_SA rekeying and initial negotiation
include a separate Diffe-Hellman exchange (since 5.0.0 this also
applies to IKEv1 Quick Mode). However, for IKEv2, the keys of the
CHILD_SA created implicitly with the IKE_SA will always be derived
from the IKE_SA's key material. So any DH group specified here will
only apply when the CHILD_SA is later rekeyed or is created with a
separate CREATE_CHILD_SA exchange. Therefore, a proposal mismatch
might not immediately be noticed when the SA is established, but may
later cause rekeying to fail.
If I set the lifetime to 86400s (the maximum for the draytek), then the connection runs fine for hours. Which means, the underlying DSL connection is not causing the issues. If I change the lifetime to 600s (the minimum for draytek) then the connection fails about every 1000 seconds. (2x 600 - 300).
ER-X Config (anonymized):
# show vpn
ipsec {
allow-access-to-local-interface enable
auto-firewall-nat-exclude enable
esp-group FOO0 {
compression disable
lifetime 86400
mode tunnel
pfs enable
proposal 1 {
encryption aes128
hash sha1
}
}
global-config "charon.make_before_break := yes"
ike-group FOO0 {
ikev2-reauth no
key-exchange ikev2
lifetime 86400
proposal 1 {
dh-group 14
encryption aes128
hash sha1
}
}
ipsec-interfaces {
interface eth0
}
site-to-site {
peer remote.example.com {
authentication {
mode pre-shared-secret
pre-shared-secret "secret"
}
connection-type respond
description remote
ike-group FOO0
ikev2-reauth inherit
local-address xxx.xxx.xxx.xxx
tunnel 1 {
allow-nat-networks disable
allow-public-networks disable
esp-group FOO0
local {
prefix 192.168.70.0/24
}
remote {
prefix 192.168.71.0/24
}
}
}
}
}
This results in the following ipsec.conf for strongswan with a manual edit to add rekey=no.
conn peer-remote-example.com-tunnel-1
left=xxx.xxx.xxx.xxx
right=remote.example.com
rightid="%any"
leftsubnet=192.168.70.0/24
rightsubnet=192.168.71.0/24
ike=aes128-sha1-modp2048!
keyexchange=ikev2
reauth=no
ikelifetime=86400s
esp=aes128-sha1-modp2048!
keylife=86400s
rekey=no
rekeymargin=540s
type=tunnel
compress=no
authby=secret
auto=route
keyingtries=1