Ok, so I checked the Apache configs on the server where I can get websites running and the configs on the website where varnish keeps returning 503 and 500 and I found they were the same. The only difference is php-fpm, but I can't think of the reason why that would be the case.
[root@webdev01 ~]# sudo netstat -plnt
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
tcp 0 0 127.0.0.2:80 0.0.0.0:* LISTEN 1679/varnishd
tcp 0 0 172.31.23.5:80 0.0.0.0:* LISTEN 1644/nginx
tcp 0 0 127.0.0.1:80 0.0.0.0:* LISTEN 1620/httpd
tcp 0 0 0.0.0.0:22 0.0.0.0:* LISTEN 1177/sshd
tcp 0 0 127.0.0.1:25 0.0.0.0:* LISTEN 1439/master
tcp 0 0 172.31.23.5:443 0.0.0.0:* LISTEN 1644/nginx
tcp 0 0 127.0.0.1:443 0.0.0.0:* LISTEN 1620/httpd
tcp 0 0 127.0.0.1:6082 0.0.0.0:* LISTEN 1678/varnishd
tcp 0 0 127.0.0.1:11211 0.0.0.0:* LISTEN 1155/memcached
tcp 0 0 127.0.0.1:6379 0.0.0.0:* LISTEN 1072/redis-server 1
tcp 0 0 :::22 :::* LISTEN 1177/sshd
tcp 0 0 :::3306 :::* LISTEN 1315/mysqld
[root@webdev01 ~]#
This is where it's working, and we don't see php-fpm.
[centos@staging script]$ sudo /usr/sbin/php-fpm
[28-Oct-2021 15:17:31] ERROR: An another FPM instance seems to already listen on /var/run/php-fpm/php5-fcgi-staging01.sock
[28-Oct-2021 15:17:31] ERROR: FPM initialization failed
So it's running on a sock? But for some reason I don't see it listening to a port? Are they different?
[root@webdev01 ~]# sudo service php-fpm status
php-fpm (pid 1455) is running...
So it's running.
On the server where I can't have it running I have:
[centos@staging03 script]$ sudo netstat -plnt
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
tcp 0 0 127.0.0.2:80 0.0.0.0:* LISTEN 2624/varnishd
tcp 0 0 127.0.0.1:80 0.0.0.0:* LISTEN 2580/httpd
tcp 0 0 172.31.22.60:80 0.0.0.0:* LISTEN 1582/nginx
tcp 0 0 0.0.0.0:22 0.0.0.0:* LISTEN 1290/sshd
tcp 0 0 127.0.0.1:25 0.0.0.0:* LISTEN 1544/master
tcp 0 0 127.0.0.1:443 0.0.0.0:* LISTEN 2580/httpd
tcp 0 0 127.0.0.1:6082 0.0.0.0:* LISTEN 2623/varnishd
tcp 0 0 127.0.0.1:9000 0.0.0.0:* LISTEN 3397/php-fpm
tcp 0 0 127.0.0.1:11211 0.0.0.0:* LISTEN 1268/memcached
tcp 0 0 127.0.0.1:6379 0.0.0.0:* LISTEN 1061/redis-server 1
tcp 0 0 :::22 :::* LISTEN 1290/sshd
tcp 0 0 :::3306 :::* LISTEN 1422/mysqld
I looked inside etc/php-fpm.d and found this file:
[php5-fcgi-elvis]
listen = /var/run/php-fpm/php5-fcgi-elvis.sock
listen.allowed_clients = 127.0.0.1
user = elvis
;group = elvis
pm = dynamic
pm.max_children = 50
pm.start_servers = 14
pm.min_spare_servers = 14
pm.max_spare_servers = 25
pm.max_requests = 500
catch_workers_output = yes
request_slowlog_timeout = 8
slowlog = /var/log/php-fpm/www-slow.log
php_admin_value[error_log] = /var/log/php-fpm/www-error.log
php_admin_flag[log_errors] = on
php_value[session.save_handler] = files
php_value[session.save_path] = /var/lib/php/session
listen.owner = apache
listen.group = apache
listen.mode = 0666
And it's almost the same as the one on the faulty server:
[php5-fcgi-staging03]
listen = /var/run/php-fpm/php5-fcgi-staging03.sock
listen.allowed_clients = 127.0.0.1
user = staging03
;group = staging03
pm = dynamic
pm.max_children = 13
pm.start_servers = 4
pm.min_spare_servers = 4
pm.max_spare_servers = 7
pm.max_requests = 500
catch_workers_output = yes
request_slowlog_timeout = 8
slowlog = /var/log/php-fpm/www-slow.log
php_admin_value[error_log] = /var/log/php-fpm/www-error.log
php_admin_flag[log_errors] = on
php_value[session.save_handler] = files
php_value[session.save_path] = /var/lib/php/session
listen.owner = apache
listen.group = apache
listen.mode = 0666
However, I found this www.conf file also:
[www]
group = apache
listen = 127.0.0.1:9000
listen.allowed_clients = 127.0.0.1
pm = dynamic
pm.max_children = 50
pm.start_servers = 5
pm.min_spare_servers = 5
pm.max_spare_servers = 35
php_admin_value[error_log] = /var/log/php-fpm/www-error.log
php_admin_flag[log_errors] = on
php_value[session.save_handler] = files
php_value[session.save_path] = /var/lib/php/session
php_value[soap.wsdl_cache_dir] = /var/lib/php/wsdlcache
So would deleting this www.conf file solve every problem? Because I am thinking there are additional steps. I just don't have the full picture to know what are the things that I can check and what are the things that are wrong.