I'm troubleshooting an issue with a SAS vendor. To be clear, this question isn't "how do I fix it?", nor is it "what exactly is causing this problem?" -- rather, it's "how do these technologies work, such that this combination of symptoms is possible?" I have a support ticket open with the vendor already (and I am less-than-patiently waiting for it to be escalated to someone sufficiently capable). The purpose of this question is to expand my own understanding of how these things (typically?) work, and what variables might be in play that I have not considered.
Said vendor provides a "domain customization" feature, where you can access their service via a domain you control. (You provide them with a private key, plus a certificate w/ chain, and you add a CNAME entry that points at a domain under the vendor's control.)
I have two "tenants" with this vendor -- one for development purposes and one for production. Both are configured with custom domains. They use the exact same cert and PK; the cert has production domain as its CN, and both dev and prod domains listed as SANs.
The production tenant works perfectly, so I know the cert is correct. However, when I visit the dev tenant in my browser, about 80-90% of the time I get a certificate error, and when I investigate, my browser reports that the cert being presented is valid but isn't mine, but rather one that belongs to the vendor (and, as such, does not list my domain as CN or SAN). I've tried different browsers. I've tried curl
. I've tried remoting in to various servers I have access to and checking from there. My coworkers, who are distributed around the US, have tried as well. The behavior does not seem to vary with client software, client hardware, geographic location, or network configuration.
At that point in my process, I'm thinking "ok, they have a pool of servers behind a load balancer, and some of the servers don't have the correct cert and so they're presenting something else in some sort of fallback." Sure, fine, makes sense.
But then I tried DigiCert's Website Security tool, where you can enter a domain and it will evaluate the correctness of your certificate (among other things). Using that tool, I cannot reproduce the intermittent behavior; instead, it fails every time.
As a software engineer with a few decades of experience building web sites and services in various stacks, I have a reasonably good understanding of DNS, HTTPS, TLS certificates, webserver configuration, network routing, load balancing, and so on. But I'm stumped as to how the DigiCert validator could be be seeing different behavior than I see myself.
My first thought was a DNS propagation delay, but DNSChecker indicates no such issue. Next, I considered that DigiCert might be caching something on their end, but that would make for a glaring flaw in their tool. In both cases, the likelihood of that explanation has diminished now that this behavior has persisted for a few days.
So, my question, for those out there with more expertise in this than I: what possible explanations are there for the DigiCert tool's experience being different than every other client I've tried?
(Apologies if this isn't the right SE site for a question like this. It seemed a better choice than Network Engineering, and as I perused my options I didn't see any others that looked right.)