I have an Nginx server that acts as a reverse proxy and a static file server.
My users complain that sometimes the response is slow.
I tried examining the .her file to understand the issue better and found something unusual.
time wait on https://cdnjs.cloudflare.com****
[10.00600000781566, 10.09000002155453, 12.657999974526462, 30.81500001040101, 50.140000049091874]
time wait on https://fonts.gstatic.com/****
[0.8139999584183073, 0.8409999709799933, 1.4160000515654616, 2.3279999482259086, 79.60999999868869]
time wait on my server:
[59.73200001836568, 59.820000056281685, 60.4199999724552, 60.530999979116025, 60.79299996264279, 61.397000021353364, 61.590999948196114, 61.89499994969368, 62.058999961778525, 68.06700001706928, 68.25800005129724, 68.29399997562915, 68.88300005169958, 69.07899998541922, 69.38299999047071, 69.6550000514686, 69.6710000243038, 69.69899996588379, 69.76500002556294, 70.05899996621906, 70.57600003184378, 76.85900000137836, 256.38100003309546, 267.7780000296906, 280.696999967508, 461.3320000257194, 465.38599997801333, 476.49299997053294, 484.6409999888614, 748.1390000376775]
As you can see, there is a very large deviation of the wait time in the requests for the same resource. It is occurring in cdnjs.cloudflare.com
and fonts.gstatic.com
which is a reputable server, but on my server the difference larger.
I initially thought this is a caching issue. But in the above wait time, all the requests are cached excluding 1, I checked the MISS
request $request_time
in Nginx and it's 0.4
so clearly this is not the source of the issue.
The server consumes 10% of bandwidth and CPU is not exhausted (under 40% utilisation in peak time).
Can someone please help me understand why there is such a big variance in the wait time? Is the issue with my server or the client? How can I pinpoint the problem?
This is my configuration file of the server (Nginx - openresty):
events {
worker_connections 1024;
}
env SERVER_BACKEND_NAME;
env SERVER_CDN_NAME;
env SERVER_CDN_SPESIFIC_NAME;
http {
error_log /var/errors/externalNginx.http.error_1 error;
# The "auto_ssl" shared dict should be defined with enough storage space to
# hold your certificate data. 1MB of storage holds certificates for
# approximately 100 separate domains.
lua_shared_dict auto_ssl 1m;
# The "auto_ssl_settings" shared dict is used to temporarily store various settings
# like the secret used by the hook server on port 8999. Do not change or
# omit it.
lua_shared_dict auto_ssl_settings 64k;
# A DNS resolver must be defined for OCSP stapling to function.
#
# This example uses Google's DNS server. You may want to use your system's
# default DNS servers, which can be found in /etc/resolv.conf. If your network
# is not IPv6 compatible, you may wish to disable IPv6 results by using the
# "ipv6=off" flag (like "resolver 8.8.8.8 ipv6=off").
resolver 127.0.0.11;
# Initial setup tasks.
init_by_lua_block {
auto_ssl = (require "resty.auto-ssl").new()
-- Define a function to determine which SNI domains to automatically handle
-- and register new certificates for. Defaults to not allowing any domains,
-- so this must be configured.
auto_ssl:set("allow_domain", function(domain)
return true
end)
auto_ssl:init()
}
init_worker_by_lua_block {
auto_ssl:init_worker()
}
# Internal Cdn nginx backend
upstream cdnnginx_backend {
server cdnnginx;
}
# Limits - so it will be harder to DOS me
limit_req_log_level warn;
limit_req_zone $binary_remote_addr zone=login:10m rate=10r/m;
# HTTPS cdn server - will later on become our image cdn as well
server {
listen 443 ssl http2;
server_name ${SERVER_IMAGE_AND_FILE_CDN_NAME};
error_log /var/errors/externalNginx.${SERVER_IMAGE_AND_FILE_CDN_NAME}.error_1 error;
# Dynamic handler for issuing or returning certs for SNI domains.
ssl_certificate_by_lua_block {
auto_ssl:ssl_certificate()
}
# This is for the Config service so we can pass big enogh information on the headers
large_client_header_buffers 4 32k;
ssl_certificate /etc/resty-default-ssl/resty-auto-ssl-fallback.crt;
ssl_certificate_key /etc/resty-default-ssl/resty-auto-ssl-fallback.key;
# Enable gzip compression
gzip on;
gzip_types text/plain text/css application/json application/javascript text/xml application/xml application/xml+rss text/javascript;
sendfile on;
# Serve the compressed HTML file
location /saved.html {
root /var/compenion;
# we need to clean it in the most external service
more_clear_headers 'Server';
# this is against CORS policy - we want everyone to be able to access this
more_set_headers 'Access-Control-Allow-Origin: *';
}
location / {
# we need to clean it in the most external service
more_clear_headers 'Server';
return 404 'Nanana';
}
}
# HTTPS cdn - backword competability to mother existing links video.malkali.com
server {
listen 443 ssl http2;
server_name my.server.com;
error_log /var/errors/externalNginx.my.server.com.error_1 error;
# Dynamic handler for issuing or returning certs for SNI domains.
ssl_certificate_by_lua_block {
auto_ssl:ssl_certificate()
}
ssl_certificate /etc/resty-default-ssl/resty-auto-ssl-fallback.crt;
ssl_certificate_key /etc/resty-default-ssl/resty-auto-ssl-fallback.key;
location /Health {
return 200 "ComasTas";
}
# This check HealthCheck of caching nginx
location /Health2 {
proxy_pass http://cdnnginx_backend/Health2;
# this is against CORS policy - we want everyone to be able to access this
more_set_headers 'Access-Control-Allow-Origin: *';
}
# this will check the proxy health check
location /1/Health {
proxy_pass http://cdnnginx_backend/Health3;
# this is against CORS policy - we want everyone to be able to access this
more_set_headers 'Access-Control-Allow-Origin: *';
}
# This will pass normal links forward (signed and unsigned?) - should enforce similar links only to pass forward and reduce attacks
location ~ "^\/(?:[0-9A-Fa-f]{2}){16}\/(?:[0-9A-Fa-f]{2}){16}\/(?:.?[^\/]+)$" {
# solve the byte-range requestr support
proxy_force_ranges on;
more_set_headers 'Accept-Ranges: bytes';
# solve the issue of gateway timeout
proxy_read_timeout 300s;
set $continue_url $uri;
if ($arg_sig)
{
set $temp_cache 1;
}
if ($arg_refere)
{
set $temp_cache 2$temp_cache;
}
if ($temp_cache = 1)
{
set $continue_url $uri?sig=$arg_sig;
}
if ($temp_cache = 21)
{
set $continue_url $uri?sig=$arg_sig&refere=$arg_refere;
}
if ($temp_cache = 2)
{
set $continue_url $uri?refere=$arg_refere;
}
# solve pre-flight issue
if ($request_method = OPTIONS ) {
# this is against CORS policy - we want everyone to be able to access this
more_set_headers 'Access-Control-Allow-Origin: *';
more_set_headers 'Access-Control-Allow-Headers: refere, Origin';
add_header Content-Length 0;
add_header Content-Type text/plain;
return 200;
}
# add the header of the request host so we will be able to route insode
proxy_set_header X-Forwarded-Host $scheme://$http_host;
proxy_pass http://cdnnginx_backend$continue_url;
# we need to clean it in the most external service
more_clear_headers 'Server';
# clear our internals evidence
more_clear_headers 'V1Latency';
more_clear_headers 'V1RequestTime';
more_clear_headers 'ProxyCache';
more_clear_headers 'S3priority';
more_clear_headers 'V1internalcache';
# this is against CORS policy - we want everyone to be able to access this
more_set_headers 'Access-Control-Allow-Origin: *';
more_set_headers 'Access-Control-Allow-Headers: refere, Origin';
# @@@ For dbg only
more_set_headers 'start_time: $msec';
more_set_headers 'total_time: $request_time';
}
error_page 404 403 500 502 503 /error-page.html;
location = /error-page.html {
internal;
return 404 "Nanana";
}
location = / {
return 404 "Nanana";
}
}
# HTTP server
server {
listen 80;
location /HealthCheck {
return 200;
}
# Endpoint used for performing domain verification with Let's Encrypt.
location /.well-known/acme-challenge/ {
content_by_lua_block {
auto_ssl:challenge_server()
}
}
}
# Internal server running on port 8999 for handling certificate tasks.
server {
listen 127.0.0.1:8999;
# Increase the body buffer size, to ensure the internal POSTs can always
# parse the full POST contents into memory.
client_body_buffer_size 128k;
client_max_body_size 128k;
location / {
content_by_lua_block {
auto_ssl:hook_server()
}
}
}
}
I try to investigate my server itself:
iftop
:
TX: cum: 816MB peak: 36.8Mb rates: 17.3Mb 24.2Mb 26.6Mb RX: 272MB 13.2Mb 7.76Mb 7.92Mb 8.90Mb TOTAL: 1.06GB 47.3Mb 25.1Mb 32.2Mb 35.5Mb
top
:
top - 08:26:56 up 5 days, 22:33, 2 users, load average: 0.40, 0.27, 0.44
Tasks: 271 total, 1 running, 270 sleeping, 0 stopped, 0 zombie
%Cpu(s): 0.6 us, 0.4 sy, 0.0 ni, 98.8 id, 0.0 wa, 0.0 hi, 0.2 si, 0.0 st
MiB Mem : 31986.2 total, 12924.5 free, 3720.9 used, 15340.8 buff/cache
MiB Swap: 1024.0 total, 1004.5 free, 19.5 used. 27610.2 avail Mem
The server has 1gbps throughput with a strong CPU.
It utilizes less than 15% of the available processor and less than the 1gbps throughput capability.
How can I further investigate this issue?