Score:0

Best practices to troubleshoot stuck php application due to internal curl calls to unresponsive endpoints

cn flag

Recently I found myself with a website (Prestashop e-commerce on a Centos PHP-FPM /Apache / MySql machine ) that was down and not responding to web requests.

After investigation, issue was due to an API call made with php-curl towards an endpoint that was temporarily offline, inside an application PHP file that was recalled in all pages of the website.

The cURL call had been wrongly made without a CURLOPT_TIMEOUT_MS settings, so users visiting my website filled rapidly the maximum number of php connections, blocking the php-fpm processes and preventing my server to receive other incoming connections.

I wonder if one can quickly and effectively prevent / identify such a problem "in production" from the terminal if it happens again (especially to quickly understand which is the blocked endpoint or identify the file from which the script that blocked the server is generated), since in my case I had to check the issue at "application level" rather than from server since :

  • launching "top" the server shows the list of blocked php-fpm processes without any additional information to understand the problem (also server load average was about 0.00 since there's was almost no activity due to stuck connections).
  • Launching "netstat -nputw" show me a lot of internal connections in TIME_WAIT status, but again no information about the outage "culprit" (could I see the endpoint called up by php-curl with netstat or a similar network command ?)
  • Launching a "strace" of the php-fpm processes I see a lot of involved files, but this is not very helpful since the site, with average traffic, opens dozens and dozens of files.
  • The webserver logs only informed me of timeout connections to web resources, but not of the script containing the problematic cURL call.

Thanks for your help.

ua flag
Add the timeout and measure how lng it does hang.
user3256843 avatar
cn flag
the question is not what to do after having understood the root of the problem, but how to properly detect it in case it happens again (e.g. on another website)
ua flag
If the results of the curl are required for building the page, then you must wait until the curl fails or times out. What aspect of this statement can you relax?
user3256843 avatar
cn flag
maybe I have not been clear enough in formulating the question, what I need to know is, from the "sysadmin" point of view, how to find from terminal, in the quickest possible time, the root cause in a situation like this if it were to happen again for example on another server, without being aware of how the application is made and without analyzing the application.
ua flag
And my suggestion was one step toward that. I may have further clues after you answer my questions. (When I can't answer a question, I at least try to help with the debugging.)
user3256843 avatar
cn flag
I will try to explain myself better: once I found the problem on the application, I know perfectly well that in order to solve it a timeout has to be set on the unresponsive curl call (or the curl call has to be disabled altogether) but fixing the poor written application was not part of my job... The question was asked because my need is as a sysadmin - to identify the root cause of problem without knowing nothing about the underlying application in the quickest possibile time with a shell in front of me.
ua flag
An anecdote: Many years ago, I had a program that was doing a lot of curls. Once in a while, it would hang. After researching it quite a lot, and asking experts, I came to the conclusion that something very low in the OS was causing the problem. I could repeatedly show that the hang was exactly 80.0 seconds. This, of course, was unacceptable. But I could not find a workaround within the thread. (Possibly using multiple threads would have let me continue processing, but I did not want to go there.)
mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.