Score:0

Frequent server downtime and suspicious server hits to access random Pdf files

py flag

Our Server hosting PHP web application is facing frequent downtime

Server Information [Nginx, FreeBSD] Web application [PHP 5.6, MYsql 5.7]

I have gone through the Nginx logs and below are my findings

error.log has some below logs

2023/05/31 19:48:16 [error] 1456#100101: *7408 open() "/usr/local/www/html/uploads/files/22816683587.pdf" failed (2: No such file or directory), client: 5.255.231.177, server: localhost, request: "GET /uploads/files/22816683587.pdf HTTP/1.1", host: "somesite.com"

there any several request like this to access random pdf files from server from different Ip's

debug.log has some overflow logs

May 31 18:43:40 ip-XX-XX-XX-XX kernel: sonewconn: pcb 0xfffff800e1910dc8: Listen queue overflow: 193 already in queue awaiting acceptance (265 occurrences) 
May 31 18:44:40 ip-XX-XX-XX-XX kernel: sonewconn: pcb 0xfffff800e1910dc8: Listen queue overflow: 193 already in queue awaiting acceptance (330 occurrences) 
May 31 18:45:40 ip-XX-XX-XX-XX kernel: sonewconn: pcb 0xfffff800e1910dc8: Listen queue overflow: 193 already in queue awaiting acceptance (314 occurrences)

When I checked below netstat result at the time of downtime

root@myIP:~ # netstat -Lan
Current listen queue sizes (qlen/incqlen/maxqlen)
Proto Listen         Local Address
tcp6  0/0/128        *.443
tcp4  0/0/128        *.80
tcp4  0/0/128        *.443
tcp4  0/0/10         127.0.0.1.25
tcp4  0/0/128        *.22
tcp6  0/0/128        *.22
tcp46 0/0/80         *.3306
tcp4  193/0/128      127.0.0.1.9000
unix  0/0/80         /tmp/mysql.sock
unix  0/0/4          /var/run/devd.pipe
unix  0/0/4          /var/run/devd.seqpacket.pipe

From above I found "tcp4 193/0/128 127.0.0.1.9000" this used by PHP FPM service

I check Php-FPM logs too, there is no such slow process logged that is taking more than 5 second, there are some but those records logged in when the server is in getting slow or down

Some Nginx and FastCGi Paramters are set in Nginx.conf

client_header_timeout 3000;
client_body_timeout 3000;
fastcgi_read_timeout 3000;
client_max_body_size 32m;
fastcgi_buffers 8 128k;
fastcgi_buffer_size 128k;
server_name_in_redirect off;
server_names_hash_bucket_size 64;
server_names_hash_max_size 8192;

We enabled MySQL's slow query log, but nothing logged into it the result is empty.

We contacted AWS as the server is using an AWS instance, they told us that there is high CPU usage sometimes randomly and there is no issue with Database, and it looks healthy

We asked them if we are under a DDOs attack, but they refused it saying that the network stat doesn't show it.

We are struggling to understand and find the exact reason or process behind this server downtime.

Please help, Thanks in advance

Update

kernel: sonewconn: pcb 0xfffff80007326c40: Listen queue overflow: 1537 already in queue awaiting acceptance (167 occurrences)

I increased the limit of listen queue but now that is also getting overloaded

Jaromanda X avatar
ru flag
why did you mention `there is no such slow process logged that is taking more than 5 second`? there's no other mention at all regarding any slow process - you state that the server is down, not slow - as for the random requests - every public server gets that
Ekky avatar
py flag
I just clear out that aspect so that while going through my query one can understand that no process is slow
Jaromanda X avatar
ru flag
so, what you're saying is that there's no indication in the logs you've looked at as to why the site is down - only a vague suggestion from AWS that there is occasional high CPU usage, but you haven't been able to correlate the high CPU usage with down time ... is that the gist of it?
Ekky avatar
py flag
Yes, I monitored HTOP outcomes while sites were running and CPu usage was normal even then server getting down , I suspect "sonewconn: pcb 0xfffff800e1910dc8: Listen queue overflow:" is the issue behind server down time, this is used by "tcp4 193/0/128 127.0.0.1.9000" (PHP FPM)
Ekky avatar
py flag
is it possible due to misconfigured Fastcgi, PHP-FPM, or nginx configurations?
Jaromanda X avatar
ru flag
so if you monitor log messages, is there a correlation between the server being down and those log messages occuring?
Ekky avatar
py flag
Let us [continue this discussion in chat](https://chat.stackexchange.com/rooms/146417/discussion-between-ekky-and-jaromanda-x).
us flag
PHP 5.6 has been end of life early 2017, and security support has ended in 2018. It is very likely there are multiple security vulnerabilities in it, and your website could be just compromised. I recommend building the website on PHP8.
Ekky avatar
py flag
kernel: sonewconn: pcb 0xfffff80007326c40: Listen queue overflow: 1537 already in queue awaiting acceptance (167 occurrences) I increased the limit of listen queue but now that is also getting overloaded
Wilson Hauck avatar
jp flag
Additional DB information request, please. OS, Version? RAM size, # cores, any SSD or NVME devices on MySQL Host server? Post TEXT data on justpaste.it and share the links. From your SSH login root, Text results of: A) SELECT COUNT(*), sum(data_length), sum(index_length), sum(data_free) FROM information_schema.tables; B) SHOW GLOBAL STATUS; after minimum 24 hours UPTIME C) SHOW GLOBAL VARIABLES; D) SHOW FULL PROCESSLIST; E) STATUS; not SHOW STATUS, just STATUS; G) SHOW ENGINE INNODB STATUS; for server workload tuning analysis to provide suggestions.
Score:2
si flag

It looks to me like you're on the right track. If you take a closer look at the output of netstat -Lan:

root@myIP:~ # netstat -Lan
Current listen queue sizes (qlen/incqlen/maxqlen)
Proto Listen         Local Address
tcp4  193/0/128      127.0.0.1.9000

It is telling you that the length of the queue (qlen) is 193 while the max. length is only 128 (maxqlen). Since the queue length exceeds the maximum length of 128, you're dropping connections. This is also confirmed by the log messages like

May 31 18:45:40 ip-XX-XX-XX-XX kernel: sonewconn: pcb 0xfffff800e1910dc8: Listen queue overflow: 193 already in queue awaiting acceptance (314 occurrences)

where the number 193 is also mentioned.

You can confirm that your maxqlen is set to 128:

sysctl kern.ipc.somaxconn

You may want to set it to a higher value like so:

sysctl -w kern.ipc.somaxconn=1024

In order to make this change persistent, so that it survives a reboot, you would to add a line to your /etc/sysctl.conf:

kern.ipc.somaxconn=1024

This larger queue would probably make your Listen queue overflow messages go away and avoid the connection drops / downtime.

However, it should be investigated why there are so many requests and whether this is expected load or if there are some automated / scripted attacks happening. In the later case, it would be a good mitigation to configure fail2ban to blacklist IP adresses that exhaust a threshould of a given number of downloaded PDF files per minute or something like that.

Ekky avatar
py flag
Thank you for your help ! I checked output of sysctl kern.ipc.somaxconn which comes 128, So as you suggested I added line kern.ipc.somaxconn=1024 "/etc/sysctl.conf" and rebooted server. But when I again checked output of sysctl kern.ipc.somaxconn it comes same 128 and file "/etc/sysctl.conf" still has new line kern.ipc.somaxconn=1024 kern.ipc.somaxconn: 128 is it overriding from someone else or instead of reboot I need to do something else
Andreas Piening avatar
si flag
@Ekky There is a service that should apply the settings from */etc/sysctl.conf*. You can check if it is running on your system with `/etc/rc.d/sysctl status`. You can even try to execute `/etc/rc.d/sysctl start` and see if the setting is applied or if you get any errors. Please make sure the line with `kern.ipc.somaxconn=1024` does not have quotation marks or spaces in it. If everything else fails, please try to add this line to `/boot/loader.conf` instead of `/etc/sysctl.conf`.
Ekky avatar
py flag
Yes, I confirm that "sysctl is running" additionally from below reference I added same line in "/boot/loader.conf ", https://serverfault.com/questions/171797/how-to-make-setting-of-kern-ipc-somaxconn-persistent
Ekky avatar
py flag
Still output is below, sysctl kern.ipc.somaxconn kern.ipc.somaxconn: 128 , change is not persisting after reboot
Andreas Piening avatar
si flag
@Ekky that's strange. The only reason I could think of is that some startup script overwrites this value. Obviously it is hard to tell where this happens, maybe you can try `grep -R "kern.ipc.somaxconn" /etc/` to find occurrences.
Ekky avatar
py flag
I rebooted server after change and checked /etc/rc.d/sysctl status , its not running then I tried to start it but still not running
Ekky avatar
py flag
is it possible to connect with you other than this platform?
Andreas Piening avatar
si flag
I'm afraid `/etc/rc.d/sysctl` is not a real service, it is more like script being executed once on startup. So I think it is fine if `service sysctl status` reports *sysctl is not running.*. But it should be enabled anyways with `service sysctl enable` and you can check with `service sysctl start` followed by `sysctl kern.ipc.somaxconn` if your systctl setting is applied after this script executes.
Ekky avatar
py flag
ok, Got it. I understand your point and followed the instructions, result of sysctl kern.ipc.somaxconn is kern.ipc.somaxconn: 1024 But I afraid that if I reboot the server again this change will not persist
Andreas Piening avatar
si flag
@Ekky At least this confirms that the value is set correctly by the service-script. So if it is enabled (`service sysctl enable`) it is is expected to be executed on startup as well. But this setting can still be overwritten in any other script that is executed on startup after this. You can of course add a cron-job that executes like 30 seconds after system startup to make sure that it is set to the value you want, but avoiding this value to be overwritten would be the better option. This may be very specific to your system, so I can't tell you where it happens.
Ekky avatar
py flag
Andreas, I don't have words to thank you. I don't know if this solves this issue or not, But I really appreciate the way to tried to help me. Please keep checking this thread in case I need your help that's all I can say for now and I will look for that overriding thing. Thank you
Ekky avatar
py flag
sysctl kern.ipc.somaxconn gets back to 128 after rebooting server
I sit in a Tesla and translated this thread with Ai:

mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.