Score:0

EC2 with a Caddy + Gunicorn setup sporadically unreachable

td flag

I have an EC2 instance that runs a Django via gunicorn, in which Caddy sits on top of. The domain is hosted in Route53 with an A record pointing to the IP address of the instance.

Here's what I currently have:

# gunicorn.service
[Unit]
Description=gunicorn daemon
Requires=gunicorn.socket
After=network.target

[Service]
User=root
Group=root
WorkingDirectory=/opt/app_repo
Restart=always
ExecStart=/opt/app_repo/venv/bin/gunicorn \
          --access-logfile /opt/app_repo/gunicorn.access.log \
          --error-logfile /opt/app_repo/gunicorn.error.log \
          --timeout 600 \
          --workers 5 \
          --bind unix:/run/gunicorn.sock \
          --log-level DEBUG \
          --capture-output \
          app_repo.wsgi:application

[Install]
WantedBy=multi-user.target
# gunicorn.socket
[Unit]
Description=gunicorn socket

[Socket]
ListenStream=/run/gunicorn.sock

[Install]
WantedBy=sockets.target
# Caddyfile
CADDY_SERVER_NAME {
    @notStatic {
        not {
            path /staticfiles/*
        }
    }

    handle_path /staticfiles/* {
        file_server
        root * /opt/app_repo/static/
    }

    reverse_proxy @notStatic unix//run/gunicorn.sock {
        header_up Host {host}
    }

    log {
        output file /opt/app_repo/caddy.access.log {
            roll_size 1gb
            roll_keep 5
            roll_keep_for 720h
        }
    }
}

The problem is that the site is reported as unreachable by our monitoring tool (and confirmed by some clients as well) for 5-10 minutes everyday for no apparent pattern at all. Whenever I SSH back onto the server, the gunicorn and caddy service are up and running (checked via systemctl status). Checking journalctl doesn't yield any helpful details:

$ journalctl -u gunicorn --boot
Feb 14 18:27:50 ip-172-31-3-73 systemd[1]: Started gunicorn daemon.
Feb 15 13:02:26 ip-172-31-3-73 systemd[1]: Stopping gunicorn daemon...
Feb 15 13:02:26 ip-172-31-3-73 systemd[1]: gunicorn.service: Deactivated successfully.
Feb 15 13:02:26 ip-172-31-3-73 systemd[1]: Stopped gunicorn daemon.
Feb 15 13:02:26 ip-172-31-3-73 systemd[1]: gunicorn.service: Consumed 1h 15min 13.075s CPU time.
Feb 15 13:02:26 ip-172-31-3-73 systemd[1]: Started gunicorn daemon.
Feb 15 13:16:52 ip-172-31-3-73 systemd[1]: Stopping gunicorn daemon...
Feb 15 13:16:53 ip-172-31-3-73 systemd[1]: gunicorn.service: Deactivated successfully.
Feb 15 13:16:53 ip-172-31-3-73 systemd[1]: Stopped gunicorn daemon.
Feb 15 13:16:53 ip-172-31-3-73 systemd[1]: gunicorn.service: Consumed 39.035s CPU time.
Feb 15 13:16:53 ip-172-31-3-73 systemd[1]: Started gunicorn daemon.
$ journalctl -u gunicorn --boot
Feb 14 18:27:50 ip-172-31-3-73 systemd[1]: Started gunicorn daemon.
Feb 15 13:02:26 ip-172-31-3-73 systemd[1]: Stopping gunicorn daemon...
Feb 15 13:02:26 ip-172-31-3-73 systemd[1]: gunicorn.service: Deactivated successfully.
Feb 15 13:02:26 ip-172-31-3-73 systemd[1]: Stopped gunicorn daemon.
Feb 15 13:02:26 ip-172-31-3-73 systemd[1]: gunicorn.service: Consumed 1h 15min 13.075s CPU time.
Feb 15 13:02:26 ip-172-31-3-73 systemd[1]: Started gunicorn daemon.
Feb 15 13:16:52 ip-172-31-3-73 systemd[1]: Stopping gunicorn daemon...
Feb 15 13:16:53 ip-172-31-3-73 systemd[1]: gunicorn.service: Deactivated successfully.
Feb 15 13:16:53 ip-172-31-3-73 systemd[1]: Stopped gunicorn daemon.
Feb 15 13:16:53 ip-172-31-3-73 systemd[1]: gunicorn.service: Consumed 39.035s CPU time.
Feb 15 13:16:53 ip-172-31-3-73 systemd[1]: Started gunicorn daemon.
$ journalctl -u caddy --boot | grep "Feb 16" | grep "error"
Feb 16 03:10:09 ip-172-31-3-73 caddy[5328]: {"level":"error","ts":1676517009.8251915,"logger":"http.handlers.reverse_proxy","msg":"aborting with incomplete response","error":"http2: stream closed"}

grep-ing dmesg for gunicorn and caddy doesn't yield anything as well as far as I can tell.

$ dmesg | grep caddy
$ dmesg | grep gunicorn
[    2.972213] systemd[1]: Configuration file /etc/systemd/system/gunicorn.socket is marked world-writable. Please remove world writability permission bits. Proceeding anyway.
[    2.984758] systemd[1]: Configuration file /etc/systemd/system/gunicorn.service is marked world-writable. Please remove world writability permission bits. Proceeding anyway.

What file/log/service should I be looking at?

I sit in a Tesla and translated this thread with Ai:

mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.