Google Cloud's million connections test (see also notes with scripts) used considerably more instances. 64 VMs as clients, 200 web server VM backends. 5000 rps per instance kept the backends managably small, reducing the possibility of scale-up challenges.
Consider TCP/UDP port exhaustion. If source IP, destination IP, and destination port remain the same, there is only 50k to 60k source ports for sockets.
Confirm that the without load balancer test actually goes through the same IP stack that the remote test does. There is overhead to establishing sockets, creating packets, and in general doing network stack things.
Measure latency, of the query and the network. A million per second means every microscond a query needs to be served, across all serving processes. Even a relatively small network latency will drastically reduce throughput, compared to the near zero latency of staying on the box.
A million requests is a not useful marketing number, however. It doesn't do any real work: "Each web response was 1 byte in size not including the http headers." Actually doing something will exhaust some other resource like storage IOPS, memory, or CPU long before the mythical million.
Quantify, based on current application use and organization planning, how many qps or connections you will need. Feel free to generously round up when capacity planning, but multiplying by perhaps 100x or 1000x without justification does not serve a purpose.
For a sense of scale, this very Stack Exchange network often gets into the top web sites on the internet by traffic. But concurrent WebSockets max out at "only" about 600,000. And the load balancers peak at 4,500 requests per second. A million per second would be much bigger.