We cannot use latency on a load test as we want to compare
geographically different sites.
Really? Response time for requests is a metric that actually corresponds to how slow a thing is to a user. Different geo regions might result in a more complex statistical distribution, sure, but its still useful to analyze.
[Waiting connections] gauge should be read as "the higher the better".
Why?
Reading and Writing active connections are doing I/O, doing work. Waiting is keep alives, waiting for the client, after they already have completed a request.
At the same requests per second level, lower reading and writing is good, because that correlates to connections being serviced quickly. Probably that means more waiting on clients, so higher waiting numbers, but there are limits to the number of connections.
Second question. On a same load test, accepted/handled connection
metrics are much higher on the most recent server (around the double).
Why?
First few seconds of both connections over time are a bit of outlier, jumping up near instantly. I'm not entirely clear on why this happens, but perhaps nginx was running for longer before the test so the counters are higher.
I would ignore the first few seconds as a warm-up. And possibly graph requests per second over time, as it may be easier to see trends in what should be a straight line.