Our Server hosting PHP web application is facing frequent downtime
Server Information [Nginx, FreeBSD] Web application [PHP 5.6, MYsql 5.7]
I have gone through the Nginx logs and below are my findings
error.log has some below logs
2023/05/31 19:48:16 [error] 1456#100101: *7408 open() "/usr/local/www/html/uploads/files/22816683587.pdf" failed (2: No such file or directory), client: 5.255.231.177, server: localhost, request: "GET /uploads/files/22816683587.pdf HTTP/1.1", host: "somesite.com"
there any several request like this to access random pdf files from server from different Ip's
debug.log has some overflow logs
May 31 18:43:40 ip-XX-XX-XX-XX kernel: sonewconn: pcb 0xfffff800e1910dc8: Listen queue overflow: 193 already in queue awaiting acceptance (265 occurrences)
May 31 18:44:40 ip-XX-XX-XX-XX kernel: sonewconn: pcb 0xfffff800e1910dc8: Listen queue overflow: 193 already in queue awaiting acceptance (330 occurrences)
May 31 18:45:40 ip-XX-XX-XX-XX kernel: sonewconn: pcb 0xfffff800e1910dc8: Listen queue overflow: 193 already in queue awaiting acceptance (314 occurrences)
When I checked below netstat result at the time of downtime
root@myIP:~ # netstat -Lan
Current listen queue sizes (qlen/incqlen/maxqlen)
Proto Listen Local Address
tcp6 0/0/128 *.443
tcp4 0/0/128 *.80
tcp4 0/0/128 *.443
tcp4 0/0/10 127.0.0.1.25
tcp4 0/0/128 *.22
tcp6 0/0/128 *.22
tcp46 0/0/80 *.3306
tcp4 193/0/128 127.0.0.1.9000
unix 0/0/80 /tmp/mysql.sock
unix 0/0/4 /var/run/devd.pipe
unix 0/0/4 /var/run/devd.seqpacket.pipe
From above I found "tcp4 193/0/128 127.0.0.1.9000" this used by PHP FPM service
I check Php-FPM logs too, there is no such slow process logged that is taking more than 5 second, there are some but those records logged in when the server is in getting slow or down
Some Nginx and FastCGi Paramters are set in Nginx.conf
client_header_timeout 3000;
client_body_timeout 3000;
fastcgi_read_timeout 3000;
client_max_body_size 32m;
fastcgi_buffers 8 128k;
fastcgi_buffer_size 128k;
server_name_in_redirect off;
server_names_hash_bucket_size 64;
server_names_hash_max_size 8192;
We enabled MySQL's slow query log, but nothing logged into it the result is empty.
We contacted AWS as the server is using an AWS instance, they told us that there is high CPU usage sometimes randomly and there is no issue with Database, and it looks healthy
We asked them if we are under a DDOs attack, but they refused it saying that the network stat doesn't show it.
We are struggling to understand and find the exact reason or process behind this server downtime.
Please help, Thanks in advance
Update
kernel: sonewconn: pcb 0xfffff80007326c40: Listen queue overflow: 1537
already in queue awaiting acceptance (167 occurrences)
I increased the limit of listen queue but now that is also getting overloaded