Score:0

Nginx and PHP-FPM 504 gateway timeout with multiple curl requests

in flag
Tio

So I've been going crazy with this for the last 5 hours or something like that.

We have been having a problem in one of our servers that suddenly started giving 504 gateway timeouts.

So first versions:

  • Debian 10 running on Proxmox 7
  • Nginx 1.14
  • PHP 7.2.34 (39+0~20230609.84+debian10~1.gbpf63844)
  • curl 7.64.0

So basically I have the following file on one virtual host, let's call it VH 1 ( example.com/sd.php ):

<?php
echo 'Response';

And on another virtual host ( VH 2 example1.com/sd.php ):

$host= 'https://example.com/sd.php';

for ( $i = 0; $i<2; $i++) {
    $ch = curl_init();
    curl_setopt($ch, CURLOPT_URL, $host);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
    $season_data = curl_exec($ch);

    if (curl_errno($ch)) {
        print_r(curl_error($ch));
        print "Error: " . curl_error($ch);
        exit();
    }

    curl_close($ch);
    echo "Responded" . $i."<br />";
}

They are both on the same machine.

Now if I go to the browser and put example1.com/sd.php and keep hitting the refresh button every [5-9] times I get a 504 gateway timeout.

And this is what I get on the logs of VH1:

2023/07/29 21:16:04 [error] 27138#27138: *851 recv() failed (104: Connection reset by peer) while reading response header from upstream, client: 95.94.77.55, server: example.com, request: "GET /sd.php HTTP/1.1", upstream: "fastcgi://unix:/run/php/php7.2-fpm.sock:", host: "example.com"

PHP-FPM is completely ok and never goes over ( what I thought was the initial problem ):

[29-Jul-2023 21:26:32.662531] DEBUG: pid 15827, fpm_pctl_perform_idle_server_maintenance(), line 378: [pool www] currently 0 active children, 25 spare children, 25 running children. Spawning rate 1
[29-Jul-2023 21:26:33.663691] DEBUG: pid 15827, fpm_pctl_perform_idle_server_maintenance(), line 378: [pool www] currently 0 active children, 25 spare children, 25 running children. Spawning rate 1
[29-Jul-2023 21:26:34.664941] DEBUG: pid 15827, fpm_pctl_perform_idle_server_maintenance(), line 378: [pool www] currently 0 active children, 25 spare children, 25 running children. Spawning rate 1

Configs of note for php-fpm:

listen = /run/php/php7.2-fpm.sock
request_terminate_timeout = 30s

Previously it was using a TCP socket, but I changed it this afternoon and nothing.

I enabled slow log, and he basically complains about the sd.php file:

[29-Jul-2023 21:23:24.315874]  [pool www] pid 15840
script_filename = /var/www/html/vh2/public/sd.php
[0x00007fb8fd21e110] curl_exec() (...)/vh1/public/sd.php:18

On the VH2 on nginx logs I have absolutely nothing, error log is enabled with debug and nothing is recorded when I get the 504 gateway timeout.

On nginx this is what I have configured:

client_body_timeout   10;
client_header_timeout 10;
keepalive_timeout     30;
send_timeout          10;

proxy_connect_timeout  600s;
proxy_send_timeout  600s;
proxy_read_timeout  600s;
fastcgi_send_timeout 600s;
fastcgi_read_timeout 600s;
fastcgi_temp_file_write_size 256k;

I also did some tests with ab, from both my machine, and from within the server, and everything is just perfect ( calling example.com/sd.php ):

tio ~  $ ab -n 5000 -c 100 https://example.com/sd.php          
This is ApacheBench, Version 2.3 <$Revision: 1879490 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking example.com (be patient)
Completed 500 requests
Completed 1000 requests
Completed 1500 requests
Completed 2000 requests
Completed 2500 requests
Completed 3000 requests
Completed 3500 requests
Completed 4000 requests
Completed 4500 requests
Completed 5000 requests
Finished 5000 requests


Server Software:        pp7-2
Server Hostname:        example.com
Server Port:            443
SSL/TLS Protocol:       TLSv1.2,ECDHE-RSA-AES256-GCM-SHA384,2048,256
Server Temp Key:        X25519 253 bits
TLS Server Name:        example.com

Document Path:          /sd.php
Document Length:        5 bytes

Concurrency Level:      100
Time taken for tests:   9.910 seconds
Complete requests:      5000
Failed requests:        0
Total transferred:      1940000 bytes
HTML transferred:       25000 bytes
Requests per second:    504.54 [#/sec] (mean)
Time per request:       198.200 [ms] (mean)
Time per request:       1.982 [ms] (mean, across all concurrent requests)
Transfer rate:          191.17 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:      113  136  23.5    133     452
Processing:    41   57  20.3     52     425
Waiting:       41   56  19.5     52     424
Total:        161  192  34.0    186     594

Percentage of the requests served within a certain time (ms)
  50%    186
  66%    192
  75%    196
  80%    199
  90%    208
  95%    219
  98%    261
  99%    334
 100%    594 (longest request)

I've also checked dmesg and /var/log/messages just in case, but nothing there.

Another example ( something is wrong here ), checked this link: https://groups.google.com/g/highload-php-en/c/qGu3Eaifj9s, and someone added this php for a simple use case:

<?php

function connect($host, $port, $timeout = 1) {
    $conn_str = "tcp://{$host}:{$port}";
    $opts = STREAM_CLIENT_CONNECT | STREAM_CLIENT_ASYNC_CONNECT | STREAM_CLIENT_PERSISTENT;
    $sock = stream_socket_client($conn_str, $errno, $errstr, $timeout, $opts);
    return $sock;
}

$sock = connect("google.com", 80);

$req = "GET / HTTP/1.0\r\nHost: www.google.com\r\nAccept: */*\r\n\r\n";

$len = fwrite($sock, $req);

$data = stream_get_contents($sock);

echo $data;

The same thing happens again, after a few refreshes to the page:

2023/07/29 21:52:34 [error] 27138#27138: *35680 recv() failed (104: Connection reset by peer) while reading response header from upstream, client: 95.94.77.55, server: example1.com, request: "GET /sd2.php HTTP/1.1", upstream: "fastcgi://unix:/run/php/php7.2-fpm.sock:", host: "www.example1.com"
2023/07/29 21:53:12 [info] 27138#27138: *35925 epoll_wait() reported that client prematurely closed connection, so upstream connection is closed too (104: Connection reset by peer) while sending request to upstream, client: 95.94.77.55, server: example1.com, request: "GET /sd2.php HTTP/1.1", upstream: "fastcgi://unix:/run/php/php7.2-fpm.sock:", host: "www.example1.com"
2023/07/29 21:54:14 [error] 27138#27138: *36093 recv() failed (104: Connection reset by peer) while reading response header from upstream, client: 95.94.77.55, server: example1.com, request: "GET /sd2.php HTTP/1.1", upstream: "fastcgi://unix:/run/php/php7.2-fpm.sock:", host: "www.example1.com"

And the php-fpm log that goes with it:

[29-Jul-2023 21:53:54.440363] DEBUG: pid 15827, fpm_pctl_perform_idle_server_maintenance(), line 378: [pool www] currently 2 active children, 27 spare children, 29 running children. Spawning rate 1
[29-Jul-2023 21:53:54.510612] WARNING: pid 15827, fpm_request_check_timed_out(), line 278: [pool www] child 16562, script '/var/www/html/vh2//public/sd2.php' (request: "GET /sd2.php") execution timed out (42.699082 sec), terminating
[29-Jul-2023 21:53:54.512034] DEBUG: pid 15827, fpm_got_signal(), line 75: received SIGCHLD
[29-Jul-2023 21:53:54.512087] WARNING: pid 15827, fpm_children_bury(), line 256: [pool www] child 16562 exited on signal 15 (SIGTERM) after 702.886545 seconds from start
[29-Jul-2023 21:53:54.512861] NOTICE: pid 15827, fpm_children_make(), line 425: [pool www] child 17005 started
[29-Jul-2023 21:53:54.512895] DEBUG: pid 15827, fpm_event_loop(), line 423: event module triggered 1 events
[29-Jul-2023 21:53:55.441977] DEBUG: pid 15827, fpm_pctl_perform_idle_server_maintenance(), line 378: [pool www] currently 1 active children, 28 spare children, 29 running children. Spawning rate 1

I don't understand what is happening with this, almost feels like a strange bug between curl + php-fpm + nginx. Thank you all!

Score:0
in flag
Tio

Eventually I found out the problem. And it was DNS resolution.

I'm using Symfony, and symfony uses curl_multi_exec, now for some reason this tries to resolve IPV6 and IPV4, and in our case the resolution for IPV6, was what was causing the problem.

There are two ways to solve it, first and the easiest is to disable IPV6 resolution in curl, which is as simple as doing this:

curl_setopt($c, CURLOPT_IPRESOLVE, CURL_IPRESOLVE_V4);

The other way is to change the DNS servers ( and this servers can't resolve IPV6, and return immediately ).

I used the second version, since with the current version of Symfony that I'm using, I can't pass options to curl yet.

I sit in a Tesla and translated this thread with Ai:

mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.