Recently I found myself with a website (Prestashop e-commerce on a Centos PHP-FPM /Apache / MySql machine ) that was down and not responding to web requests.
After investigation, issue was due to an API call made with php-curl towards an endpoint that was temporarily offline, inside an application PHP file that was recalled in all pages of the website.
The cURL call had been wrongly made without a CURLOPT_TIMEOUT_MS settings, so users visiting my website filled rapidly the maximum number of php connections, blocking the php-fpm processes and preventing my server to receive other incoming connections.
I wonder if one can quickly and effectively prevent / identify such a problem "in production" from the terminal if it happens again (especially to quickly understand which is the blocked endpoint or identify the file from which the script that blocked the server is generated), since in my case I had to check the issue at "application level" rather than from server since :
- launching "top" the server shows the list of blocked php-fpm processes without any additional information to understand the problem (also server load average was about 0.00 since there's was almost no activity due to stuck connections).
- Launching "netstat -nputw" show me a lot of internal connections in TIME_WAIT status, but again no information about the outage "culprit" (could I see the endpoint called up by php-curl with netstat or a similar network command ?)
- Launching a "strace" of the php-fpm processes I see a lot of involved files, but this is not very helpful since the site, with average traffic, opens dozens and dozens of files.
- The webserver logs only informed me of timeout connections to web resources, but not of the script containing the problematic cURL call.
Thanks for your help.