Score:3

How to migrate to Google managed certificates without downtime?

in flag

I'm moving example.com from an external (non-Google) hosting provider into GCP.

When setting up the load balancer, I noticed that I have to point example.com to the load balancer in order for the Google managed certificate to validate.

I'm supposed to just change the A record of example.com to the (static) IP of the new load balancer - then it will validate.

The problem is that I already have a lot of traffic to example.com, requests that happen after example.com starts pointing to the load balancer, but before the certificate is validated will generate SSL errors, and very unhappy users.

Has anyone solved this? I know there are ways to avoid downtime when rotating certificates, but there must be some way to migrate large sites without downtime?

Score:4
in flag

The other answers are very good, but I was very motivated to find a way to migrate to a GCP load balancer without planning for downtime where we basically just sit and wait for certificates to be issued. Adding my own answer since this question got quite a lot of traffic and it turns out that downtime is not necessary with some planning and testing.

Here's how I did it:

  1. Issue temporary Let's Encrypt certificates for example.com.
  2. Import the certificates as self-managed certificates.
  3. Set up the load balancer to use the self-managed certificates.
  4. Update DNS records to point to the load balancer.
  5. Create Google-managed certificates and assign them as a second certificate, beside the self-managed certificate.
  6. Wait for the Google-managed certificate to be ACTIVE. This can take a long time. This is where there would be downtime without the self-managed certificate.
  7. Wait 30 minutes for the certificate to propagate to all Google Front Ends (GFEs).
  8. There's now two certificates assigned to the load balancer. Update it to only use the Google-managed certificate. Done!

Details

To summarize the issue: Google's managed certificates require a DNS record pointing to a load balancer with the certificate already assigned or they won't be issued. This creates an issue, since issuing the certificate can take up to an hour and we don't want SSL errors during this time.

The workaround I found was to use self-managed certificates during the migration and switch over to the Google managed certificates once our domain was pointing to the GCP load balancer and certificates had already been issued.

I used Let's Encrypt certificates, but others will work as well. A benefit of this was that we were already using Let's Encrypt certificates, so I didn't have to worry about certificate compatibility.

Here's a simplified version of what I did. The load balancer's target proxy will be called my-target-proxy, the certificates called lb-certificate-letsencrypt and lb-certificate-managed. The domain names are example.com and subdomain.example.com.

Prerequisites:

Generate a new certificate to use as the self-managed certificate, signed by Let's Encrypt:

openssl genrsa -out letsencrypt.pem 2048

Create an openssl configuration file:

openssl.conf

[req]
default_bits              = 2048
req_extensions            = extension_requirements
distinguished_name        = dn_requirements

[extension_requirements]
basicConstraints          = CA:FALSE
keyUsage                  = nonRepudiation, digitalSignature, keyEncipherment
subjectAltName            = @sans_list

[dn_requirements]
countryName               = Country Name (2 letter code)
stateOrProvinceName       = State or Province Name (full name)
localityName              = Locality Name (eg, city)
0.organizationName        = Organization Name (eg, company)
organizationalUnitName    = Organizational Unit Name (eg, section)
commonName                = Common Name (e.g. server FQDN or YOUR name)
emailAddress              = Email Address

[sans_list]
DNS.1                     = example.com
DNS.2                     = subdomain.example.com

Generate a certificate signing request:

openssl req -new -key letsencrypt.pem -out letsencrypt.csr -config openssl.conf

Get a signed certificate from Let's Encrypt:

certbot certonly --csr letsencrypt.csr --manual --preferred-challenges dns

Update the DNS records indicated by certbot to verify ownership. Other challenges might work too, but DNS seemed easiest.

Make a note of which file is the full certificate chain and replace 0003_chain.pem in the example if necessary:

Full certificate chain is saved at: ...

Upload and create the self-managed certificate in GCP:

gcloud compute ssl-certificates create lb-certificate-letsencrypt \
  --certificate=0003_chain.pem \
  --private-key=letsencrypt.pem \
  --global

Create the load balancer that you're migrating to, or configure it to use the self-managed certificate:

gcloud beta compute target-https-proxies create my-target-proxy \
  --ssl-certificates=lb-certificate-letsencrypt [url-map and other options]
# Or
gcloud beta compute target-https-proxies update my-target-proxy \
  --ssl-certificates=lb-certificate-letsencrypt

Update DNS records to point to the IP for your load balancer. It should now be able to terminate TLS for example.com and subdomain.example.com. You can test this step by adding a record for example.com in your hosts file, pointing to the load balancer IP. John Hanley's answer gives a lot of good advice on this.

Create the managed certificate:

gcloud compute ssl-certificates create lb-certificate-managed \
  --domains=example.com,subdomain.example.com \
  --global

Wildcards are not supported by Google-managed certificates, so make sure to list all domains in use.

Assign it to the target proxy together with the self-managed certificate. Certificates will not be provisioned until they're assigned to a proxy:

gcloud beta compute target-https-proxies update my-target-proxy \
  --ssl-certificates=lb-certificate-letsencrypt,lb-certificate-managed

Check the status:

gcloud compute ssl-certificates describe lb-certificate-managed

Wait until status changes from PROVISIONING to ACTIVE - any other status is an error that should be investigated. "Provisioning a Google-managed certificate might take up to 60 minutes." according to the docs (Let's Encrypt does the same thing in seconds...).

It might take an additional 30 minutes to be available for use by a load balancer.

Therefore, wait 30 minutes after the status has changed to ACTIVE.

Update the target proxy to only use the Google-managed certificate:

gcloud beta compute target-https-proxies update my-target-proxy \
  --ssl-certificates=lb-certificate-managed

It can take a while for the new setting to propagate. Verify the issuer for the endpoint in the browser or using openssl:

openssl s_client -connect example.com:443 -showcerts \
  -CAfile /etc/ssl/certs/ca-certificates.crt <<< Q

Issuer should now be "Google Trust Services LLC" instead of "Let's Encrypt".

The self-managed Let's Encrypt certificate can now be removed, or kept around until it expires in case there's any issues with the managed certificate.

To clean up the unused self-managed Let's Encrypt certificate, run:

gcloud compute ssl-certificates delete lb-certificate-letsencrypt
Score:2
cn flag

You will have downtime.

You can follow these tips to minimize downtime. With proper planning the downtime will be very short and in some cases automatic retries will make this invisible to clients.

However, I do not know the design of your site, the usage of cookies, authentication, session management, etc. There might be disruptions that are unavoidable. If possible, consider sending an email to your customers letting them know in advance of site maintenance.

This is a good time to review your logs. Look for potential issues with access to IP addresses. Those types of issues will start to fail after the migration is complete and you shut down the old system.

  1. Remember that DNS resource records are cached globally. The resource record TTL provides a hint on how long. DNS resolvers are free to use their own interpretation of your TTL.

  2. Write down the TTL of the resource records that you will change. Now change the TTL to a short value such as 1 minute.

  3. Before making the final changes, wait for at least the old TTL to expire.

  4. Setup your services and the load balancer before making any DNS changes. Make sure the services work correctly using only the IP address. If you are redirecting IP to domain, or HTTP to HTTPS, temporarily disable those features and enable them later.

  5. Use certbot in manual mode and create a certificate that you can load into the load balancer. This removes the step of the load balancer creating the SSL certificate and waiting for verification. You can later switch to Google Managed SSL.

  6. Configure both Google Cloud Load Balancer HTTP and HTTPS frontends. Configure the Let's Encrypt SSL certificate in the frontend.

  7. Plan to leave the old site running for about 30 days after migrating. I usually see traffic for several weeks at the old site after migration.

  8. Select the time of day or day of the week with the least amount of traffic. Then switch DNS resource records. Remember that the old TTL value should have expired so that the new TTL is being used for caching.

  9. A few days later once you have verified everything is working, set the TTL values to something normal like 604800 which is the number of seconds in one week or 86400 (one day). Reenable site redirection (IP -> domain, HTTP -> HTTPS), if used.

in flag
Thank you John. Just found out that you can use certbot with a CSR, hopefully that will work. Then rotating into the Google managed certificate (you can have two of them assigned at once apparently). Will try it out and accept later.
John Hanley avatar
cn flag
@AndréLaszlo - Using a CSR will not help reduce downtime. All SSL certificates are issued by/from a CSR signed by another certificate. Unless you mean the time to fill out the details. The only significant values that Let's Encrypt uses from a CSR are the names. Most of the other parameters are ignored.
Score:1
cn flag

In addition to the previous suggestions keep in mind that Google-managed SSL certificates aren't supported for regional external HTTP(S) load balancers and internal HTTP(S) load balancers. For these load balancers you will need to use self-managed SSL certificates. I have not seen what type of load balancer you are using, however before trying to set this migration you will need to consider it. Also, in this same guide you could see how to create and use Google-managed SSL certificates and the considerations for make it work correctly1.

I would suggest you to set a maintenance window for these changes since it could take up to 30 minutes until the certificate is available to all Google Front Ends (GFEs).

Additionally, in here you will see the official guide with the step by step to reach this behavior.

1 https://cloud.google.com/load-balancing/docs/ssl-certificates/google-managed-certs

2 https://cloud.google.com/load-balancing/docs/ssl-certificates/google-managed-certs#migrating-ssl

mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.