Score:0

AWS - How to handle global "Round Trip Time"?

aw flag

Hey serverfault people,

Image a generic "Software as a Service" company offering a service running on AWS (hey, that's us). There is no rocket science involved, standard web-application doing its thing as usual and an end-user smartphone app. As customers are from Europe, naturally the AWS eu-central-1 region is containing everything for multiple tenants.

Now Sales manages to win a customer from Australia - all good so far, as the web-application can handle different timezones, currencies and locales already. But: Australia as far away as you can get from Europe (at least on earth), and so quite some round trip time is now involved. Per request we do see roughly 300ms - 400ms extra per direction (EDIT: this is wrong when speaking about RTT as pointing out in the commends, we do see 2x400ms = 800ms extra for the first HTTPS request).

For the mentioned web-application, which is used by the customer for management purposes, its totally fine. The rendered HTML is there a bit later but thanks to CDNs (CloudFront), assets are not an issue.

But the end-user smartphone application, which does smaller but more JSON requests, is affected. There it feels at the edge of "OK-ish" but definitively not snappy.

Now the question is: how to improve the timings from an end-user perspective? We already thought about a few options here:

  1. Clone the complete software and host it in AWS ap-southeast-2 as well

    Benefit: awesome performance, easy to setup, CI/CD would allow deploying the same code simultaneously in EU and AU.

    Drawbacks: we have to maintain and pay for two identical infrastructure sets, data can not be shared easily, lots of duplication in all terms.

  2. Move only computation instances to AWS ap-southeast-2

    Nope, will not work as database or redis queries would be affected by the round trip time even more.

  3. Have a read only replica in AWS ap-southeast-2 and do writes in eu-central-1

    Better as option 2, but adds lot of complexity in the code plus the number of writes is not that that few usually.

  4. Spin up a load balancer in AWS ap-southeast-2 and peer connect the VPCs

    Idea: users connect to the AU endpoint and traffic is going via beefy connection to the EU instances. However, we this would obviously not reduce the distance and we are unsure about the potential improvement (if any?)

Does anybody have experienced a similar issue and is willing to share some insights?

Update: it seems only the first HTTPS request seems to be very slow. While digging into AWS Load Balancer options, I also noticed that AWS Global Accelerator might help, so we did some tests.

From local system (in EU):

curl -w "dns_resolution: %{time_namelookup}, tcp_established: %{time_connect}, ssl_handshake_done: %{time_appconnect}, TTFB: %{time_starttransfer}\n" -o /dev/null -s "https://saas.example.com/ping" "https://saas.example.com/ping"
dns_resolution: 0,019074, tcp_established: 0,041330, ssl_handshake_done: 0,081763, TTFB: 0,103270
dns_resolution: 0,000071, tcp_established: 0,000075, ssl_handshake_done: 0,000075, TTFB: 0,017285

From AU (EC2 instance):

curl -w "dns_resolution: %{time_namelookup}, tcp_established: %{time_connect}, ssl_handshake_done: %{time_appconnect}, TTFB: %{time_starttransfer}\n" -o /dev/null -s "https://saas.example.com/ping" "https://saas.example.com/ping"
dns_resolution: 0,004180, tcp_established: 0,288959, ssl_handshake_done: 0,867298, TTFB: 1,161823
dns_resolution: 0,000030, tcp_established: 0,000032, ssl_handshake_done: 0,000033, TTFB: 0,296621

From AU to AWS Global Accelerator(EC2 instance):

curl -w "dns_resolution: %{time_namelookup}, tcp_established: %{time_connect}, ssl_handshake_done: %{time_appconnect}, TTFB: %{time_starttransfer}\n" -o /dev/null -s "https://saas-with-global-accelerator.example.com/ping" "https://saas-with-global-accelerator.example.com/ping"
dns_resolution: 0,004176, tcp_established: 0,004913, ssl_handshake_done: 0,869347, TTFB: 1,163484
dns_resolution: 0,000025, tcp_established: 0,000027, ssl_handshake_done: 0,000028, TTFB: 0,294524

In a nutshell: It seems the TLS handshake is causing the biggest initial latency. If it can be reused however, the extra time for AU to EU seems really "just" ~277ms (0,294524s - 0,017285s) for Time To First Byte.

Greetings!

cn flag
Regarding *300ms - 400ms extra per direction*, that sounds strange. I would expect the full RTT to be in that range (well, I see 250-300ms RTT to Sydney hosts but depending on where in Australia it will obviously vary... but not double as you indicated). Regarding option 4, if this is about the latency it will not really matter much (while the routing will be slightly different most of that distance is inherent, and as you noted it's really the distance that adds to the latency).
Tim avatar
gp flag
Tim
To reduce latency you need application and database in Sydney. I like #3, alter your application to use a read replica for reads and send writes to the master EU database, so long as it will actually have benefits. Otherwise you'll need the full stack in Sydney.
aw flag
@HåkanLindqvist you are absolutely right! I measured a full HTTPS request and decided it by 2, that's not the RTT.
anx avatar
fr flag
anx
The *too many writes* part may well be insignificant compared to modern browsers ability to shave off round trips. You may want to *measure* HTTP/1.1, HTTP/2, HTTP/3, 0-RTT & full-handshake separately to confirm that you really do need the database closer to your users, as opposed to, say, wait for old smartphones and MSIE to get replaced.
mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.