EC2 with a Caddy + Gunicorn setup sporadically unreachable

Question

Score:0

Server

EC2 with a Caddy + Gunicorn setup sporadically unreachable

Alcher

2/16/24, 6:23 AM

I have an EC2 instance that runs a Django via gunicorn, in which Caddy sits on top of. The domain is hosted in Route53 with an A record pointing to the IP address of the instance.

Here's what I currently have:

# gunicorn.service
[Unit]
Description=gunicorn daemon
Requires=gunicorn.socket
After=network.target

[Service]
User=root
Group=root
WorkingDirectory=/opt/app_repo
Restart=always
ExecStart=/opt/app_repo/venv/bin/gunicorn \
          --access-logfile /opt/app_repo/gunicorn.access.log \
          --error-logfile /opt/app_repo/gunicorn.error.log \
          --timeout 600 \
          --workers 5 \
          --bind unix:/run/gunicorn.sock \
          --log-level DEBUG \
          --capture-output \
          app_repo.wsgi:application

[Install]
WantedBy=multi-user.target

# gunicorn.socket
[Unit]
Description=gunicorn socket

[Socket]
ListenStream=/run/gunicorn.sock

[Install]
WantedBy=sockets.target

# Caddyfile
CADDY_SERVER_NAME {
    @notStatic {
        not {
            path /staticfiles/*
        }
    }

    handle_path /staticfiles/* {
        file_server
        root * /opt/app_repo/static/
    }

    reverse_proxy @notStatic unix//run/gunicorn.sock {
        header_up Host {host}
    }

    log {
        output file /opt/app_repo/caddy.access.log {
            roll_size 1gb
            roll_keep 5
            roll_keep_for 720h
        }
    }
}

The problem is that the site is reported as unreachable by our monitoring tool (and confirmed by some clients as well) for 5-10 minutes everyday for no apparent pattern at all. Whenever I SSH back onto the server, the gunicorn and caddy service are up and running (checked via systemctl status). Checking journalctl doesn't yield any helpful details:

$ journalctl -u gunicorn --boot
Feb 14 18:27:50 ip-172-31-3-73 systemd[1]: Started gunicorn daemon.
Feb 15 13:02:26 ip-172-31-3-73 systemd[1]: Stopping gunicorn daemon...
Feb 15 13:02:26 ip-172-31-3-73 systemd[1]: gunicorn.service: Deactivated successfully.
Feb 15 13:02:26 ip-172-31-3-73 systemd[1]: Stopped gunicorn daemon.
Feb 15 13:02:26 ip-172-31-3-73 systemd[1]: gunicorn.service: Consumed 1h 15min 13.075s CPU time.
Feb 15 13:02:26 ip-172-31-3-73 systemd[1]: Started gunicorn daemon.
Feb 15 13:16:52 ip-172-31-3-73 systemd[1]: Stopping gunicorn daemon...
Feb 15 13:16:53 ip-172-31-3-73 systemd[1]: gunicorn.service: Deactivated successfully.
Feb 15 13:16:53 ip-172-31-3-73 systemd[1]: Stopped gunicorn daemon.
Feb 15 13:16:53 ip-172-31-3-73 systemd[1]: gunicorn.service: Consumed 39.035s CPU time.
Feb 15 13:16:53 ip-172-31-3-73 systemd[1]: Started gunicorn daemon.

$ journalctl -u gunicorn --boot
Feb 14 18:27:50 ip-172-31-3-73 systemd[1]: Started gunicorn daemon.
Feb 15 13:02:26 ip-172-31-3-73 systemd[1]: Stopping gunicorn daemon...
Feb 15 13:02:26 ip-172-31-3-73 systemd[1]: gunicorn.service: Deactivated successfully.
Feb 15 13:02:26 ip-172-31-3-73 systemd[1]: Stopped gunicorn daemon.
Feb 15 13:02:26 ip-172-31-3-73 systemd[1]: gunicorn.service: Consumed 1h 15min 13.075s CPU time.
Feb 15 13:02:26 ip-172-31-3-73 systemd[1]: Started gunicorn daemon.
Feb 15 13:16:52 ip-172-31-3-73 systemd[1]: Stopping gunicorn daemon...
Feb 15 13:16:53 ip-172-31-3-73 systemd[1]: gunicorn.service: Deactivated successfully.
Feb 15 13:16:53 ip-172-31-3-73 systemd[1]: Stopped gunicorn daemon.
Feb 15 13:16:53 ip-172-31-3-73 systemd[1]: gunicorn.service: Consumed 39.035s CPU time.
Feb 15 13:16:53 ip-172-31-3-73 systemd[1]: Started gunicorn daemon.

$ journalctl -u caddy --boot | grep "Feb 16" | grep "error"
Feb 16 03:10:09 ip-172-31-3-73 caddy[5328]: {"level":"error","ts":1676517009.8251915,"logger":"http.handlers.reverse_proxy","msg":"aborting with incomplete response","error":"http2: stream closed"}

grep-ing dmesg for gunicorn and caddy doesn't yield anything as well as far as I can tell.

$ dmesg | grep caddy
$ dmesg | grep gunicorn
[    2.972213] systemd[1]: Configuration file /etc/systemd/system/gunicorn.socket is marked world-writable. Please remove world writability permission bits. Proceeding anyway.
[    2.984758] systemd[1]: Configuration file /etc/systemd/system/gunicorn.service is marked world-writable. Please remove world writability permission bits. Proceeding anyway.

What file/log/service should I be looking at?

140

0 + 0

django

amazon-ec2

gunicorn

caddy

EC2 with a Caddy + Gunicorn setup sporadically unreachable

Post an answer