I am helping implement CloudFront CDN for an NGINX HLS video origin. If you're not familiar, HLS in the browser just uses XHR or fetch to constantly request .m3u8 and .ts files via HTTP and display them in a video element. I have replicated the issue I'm describing with simple AJAX calls on an interval, so the problem is not specific to HLS. I would like to be able to switch traffic between the CDN and direct-to-origin with minimal impact to users. I have built this out, and can switch between CloudFront and direct-to-origin by changing DNS in Route 53. The DNS record has a TTL of 1 minute
However, when I do so, sometimes the IP address used by the browser does not change - even long after the DNS TTL. OS and browser level DNS cache show the expected IP address, but the browser (as shown in Developer tools -> Network) shows it is still using the "old" IP address. It can keep doing this for several hours after the DNS TTL. Even refreshing the page will not force it to get a new IP for the domain. So far, I've only found chrome://net-internals/#sockets -> Flush Socket Pools or completely closing all browser instances forces the browser to get a new IP address for the domain.
So, I'm fairly certain that the issue is that Chrome (also tested FireFox, probably all browsers), maintain a connection and do not look up DNS again until the connection is closed, regardless of the DNS TTL, especially with something like HLS video or a continuous ajax polling where the connection is being used every few seconds. I am able to control this somewhat by setting Connection:close or Keep-Alive:timeout=5s headers on the origin. However, I cannot control these at CloudFront, even with a custom function. Moreover, if I enable HTTP2 at origin and/or CloudFront, these headers are not allowed or used, but I still see similar behavior.
I can also return a HTTP 421 Misdirected Request from the origin and force clients hitting the origin to refresh. However, this does not work from CloudFront - using a CloudFront function to modify response code causes an error, and a 421 returned from origin to Cloudfront causes an error and does not trigger clients to refresh.
Given all this, how can I ensure that DNS changes take effect in the browser within the DNS entry's TTL? Is there any header or CloudFront setting I can use? I can control some of the clients, so is there is any javascript, request header, or XHR trick to force the browser to get and use the new TTL?