Score:0

Maximum QPS with Google Load Balancer

gn flag
Olu

In GCP's documentation, Google claims to support up to 1million queries per second. My team, however, as part of a project, decided to put both the Regional HTTPS LB and Global HTTPS LB to test.

Here are some of the results we got using 7 clients against 4 n2d-highcpu-64 vms with 2TB - SSD Persistent Disk for getting random key. For 30s and 4300 long persisted connections;

  1. Without the load balancer each instance returned an average of 680k qps.
  2. Using a Regional HTTPS load balancer with the 4vms as backend service, the results were about 150k qps.
  3. Using a Global HTTPS load balancer with the same 4vms, and no Cloud Armor, it averaged 205k qps.

My questions therefore are as follows:

  1. Is there anything within the Load-balancer config responsible for this throttling experienced?

  2. Is there some documentation on the recommended architecture or best practice to achieving at least 1 million qps with the Load Balancer?

screenshot of Result

screenshot of Result

Score:0
cn flag

Google Cloud's million connections test (see also notes with scripts) used considerably more instances. 64 VMs as clients, 200 web server VM backends. 5000 rps per instance kept the backends managably small, reducing the possibility of scale-up challenges.

Consider TCP/UDP port exhaustion. If source IP, destination IP, and destination port remain the same, there is only 50k to 60k source ports for sockets.

Confirm that the without load balancer test actually goes through the same IP stack that the remote test does. There is overhead to establishing sockets, creating packets, and in general doing network stack things.

Measure latency, of the query and the network. A million per second means every microscond a query needs to be served, across all serving processes. Even a relatively small network latency will drastically reduce throughput, compared to the near zero latency of staying on the box.

A million requests is a not useful marketing number, however. It doesn't do any real work: "Each web response was 1 byte in size not including the http headers." Actually doing something will exhaust some other resource like storage IOPS, memory, or CPU long before the mythical million.

Quantify, based on current application use and organization planning, how many qps or connections you will need. Feel free to generously round up when capacity planning, but multiplying by perhaps 100x or 1000x without justification does not serve a purpose.

For a sense of scale, this very Stack Exchange network often gets into the top web sites on the internet by traffic. But concurrent WebSockets max out at "only" about 600,000. And the load balancers peak at 4,500 requests per second. A million per second would be much bigger.

I sit in a Tesla and translated this thread with Ai:

mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.