So I've been going crazy with this for the last 5 hours or something like that.
We have been having a problem in one of our servers that suddenly started giving 504 gateway timeouts.
So first versions:
- Debian 10 running on Proxmox 7
- Nginx 1.14
- PHP 7.2.34 (39+0~20230609.84+debian10~1.gbpf63844)
- curl 7.64.0
So basically I have the following file on one virtual host, let's call it VH 1 ( example.com/sd.php ):
<?php
echo 'Response';
And on another virtual host ( VH 2 example1.com/sd.php ):
$host= 'https://example.com/sd.php';
for ( $i = 0; $i<2; $i++) {
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $host);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$season_data = curl_exec($ch);
if (curl_errno($ch)) {
print_r(curl_error($ch));
print "Error: " . curl_error($ch);
exit();
}
curl_close($ch);
echo "Responded" . $i."<br />";
}
They are both on the same machine.
Now if I go to the browser and put example1.com/sd.php and keep hitting the refresh button every [5-9] times I get a 504 gateway timeout.
And this is what I get on the logs of VH1:
2023/07/29 21:16:04 [error] 27138#27138: *851 recv() failed (104: Connection reset by peer) while reading response header from upstream, client: 95.94.77.55, server: example.com, request: "GET /sd.php HTTP/1.1", upstream: "fastcgi://unix:/run/php/php7.2-fpm.sock:", host: "example.com"
PHP-FPM is completely ok and never goes over ( what I thought was the initial problem ):
[29-Jul-2023 21:26:32.662531] DEBUG: pid 15827, fpm_pctl_perform_idle_server_maintenance(), line 378: [pool www] currently 0 active children, 25 spare children, 25 running children. Spawning rate 1
[29-Jul-2023 21:26:33.663691] DEBUG: pid 15827, fpm_pctl_perform_idle_server_maintenance(), line 378: [pool www] currently 0 active children, 25 spare children, 25 running children. Spawning rate 1
[29-Jul-2023 21:26:34.664941] DEBUG: pid 15827, fpm_pctl_perform_idle_server_maintenance(), line 378: [pool www] currently 0 active children, 25 spare children, 25 running children. Spawning rate 1
Configs of note for php-fpm:
listen = /run/php/php7.2-fpm.sock
request_terminate_timeout = 30s
Previously it was using a TCP socket, but I changed it this afternoon and nothing.
I enabled slow log, and he basically complains about the sd.php file:
[29-Jul-2023 21:23:24.315874] [pool www] pid 15840
script_filename = /var/www/html/vh2/public/sd.php
[0x00007fb8fd21e110] curl_exec() (...)/vh1/public/sd.php:18
On the VH2 on nginx logs I have absolutely nothing, error log is enabled with debug and nothing is recorded when I get the 504 gateway timeout.
On nginx this is what I have configured:
client_body_timeout 10;
client_header_timeout 10;
keepalive_timeout 30;
send_timeout 10;
proxy_connect_timeout 600s;
proxy_send_timeout 600s;
proxy_read_timeout 600s;
fastcgi_send_timeout 600s;
fastcgi_read_timeout 600s;
fastcgi_temp_file_write_size 256k;
I also did some tests with ab, from both my machine, and from within the server, and everything is just perfect ( calling example.com/sd.php ):
tio ~ $ ab -n 5000 -c 100 https://example.com/sd.php
This is ApacheBench, Version 2.3 <$Revision: 1879490 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/
Benchmarking example.com (be patient)
Completed 500 requests
Completed 1000 requests
Completed 1500 requests
Completed 2000 requests
Completed 2500 requests
Completed 3000 requests
Completed 3500 requests
Completed 4000 requests
Completed 4500 requests
Completed 5000 requests
Finished 5000 requests
Server Software: pp7-2
Server Hostname: example.com
Server Port: 443
SSL/TLS Protocol: TLSv1.2,ECDHE-RSA-AES256-GCM-SHA384,2048,256
Server Temp Key: X25519 253 bits
TLS Server Name: example.com
Document Path: /sd.php
Document Length: 5 bytes
Concurrency Level: 100
Time taken for tests: 9.910 seconds
Complete requests: 5000
Failed requests: 0
Total transferred: 1940000 bytes
HTML transferred: 25000 bytes
Requests per second: 504.54 [#/sec] (mean)
Time per request: 198.200 [ms] (mean)
Time per request: 1.982 [ms] (mean, across all concurrent requests)
Transfer rate: 191.17 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 113 136 23.5 133 452
Processing: 41 57 20.3 52 425
Waiting: 41 56 19.5 52 424
Total: 161 192 34.0 186 594
Percentage of the requests served within a certain time (ms)
50% 186
66% 192
75% 196
80% 199
90% 208
95% 219
98% 261
99% 334
100% 594 (longest request)
I've also checked dmesg and /var/log/messages just in case, but nothing there.
Another example ( something is wrong here ), checked this link: https://groups.google.com/g/highload-php-en/c/qGu3Eaifj9s, and someone added this php for a simple use case:
<?php
function connect($host, $port, $timeout = 1) {
$conn_str = "tcp://{$host}:{$port}";
$opts = STREAM_CLIENT_CONNECT | STREAM_CLIENT_ASYNC_CONNECT | STREAM_CLIENT_PERSISTENT;
$sock = stream_socket_client($conn_str, $errno, $errstr, $timeout, $opts);
return $sock;
}
$sock = connect("google.com", 80);
$req = "GET / HTTP/1.0\r\nHost: www.google.com\r\nAccept: */*\r\n\r\n";
$len = fwrite($sock, $req);
$data = stream_get_contents($sock);
echo $data;
The same thing happens again, after a few refreshes to the page:
2023/07/29 21:52:34 [error] 27138#27138: *35680 recv() failed (104: Connection reset by peer) while reading response header from upstream, client: 95.94.77.55, server: example1.com, request: "GET /sd2.php HTTP/1.1", upstream: "fastcgi://unix:/run/php/php7.2-fpm.sock:", host: "www.example1.com"
2023/07/29 21:53:12 [info] 27138#27138: *35925 epoll_wait() reported that client prematurely closed connection, so upstream connection is closed too (104: Connection reset by peer) while sending request to upstream, client: 95.94.77.55, server: example1.com, request: "GET /sd2.php HTTP/1.1", upstream: "fastcgi://unix:/run/php/php7.2-fpm.sock:", host: "www.example1.com"
2023/07/29 21:54:14 [error] 27138#27138: *36093 recv() failed (104: Connection reset by peer) while reading response header from upstream, client: 95.94.77.55, server: example1.com, request: "GET /sd2.php HTTP/1.1", upstream: "fastcgi://unix:/run/php/php7.2-fpm.sock:", host: "www.example1.com"
And the php-fpm log that goes with it:
[29-Jul-2023 21:53:54.440363] DEBUG: pid 15827, fpm_pctl_perform_idle_server_maintenance(), line 378: [pool www] currently 2 active children, 27 spare children, 29 running children. Spawning rate 1
[29-Jul-2023 21:53:54.510612] WARNING: pid 15827, fpm_request_check_timed_out(), line 278: [pool www] child 16562, script '/var/www/html/vh2//public/sd2.php' (request: "GET /sd2.php") execution timed out (42.699082 sec), terminating
[29-Jul-2023 21:53:54.512034] DEBUG: pid 15827, fpm_got_signal(), line 75: received SIGCHLD
[29-Jul-2023 21:53:54.512087] WARNING: pid 15827, fpm_children_bury(), line 256: [pool www] child 16562 exited on signal 15 (SIGTERM) after 702.886545 seconds from start
[29-Jul-2023 21:53:54.512861] NOTICE: pid 15827, fpm_children_make(), line 425: [pool www] child 17005 started
[29-Jul-2023 21:53:54.512895] DEBUG: pid 15827, fpm_event_loop(), line 423: event module triggered 1 events
[29-Jul-2023 21:53:55.441977] DEBUG: pid 15827, fpm_pctl_perform_idle_server_maintenance(), line 378: [pool www] currently 1 active children, 28 spare children, 29 running children. Spawning rate 1
I don't understand what is happening with this, almost feels like a strange bug between curl + php-fpm + nginx. Thank you all!