I have an EC2 instance that runs a Django via gunicorn, in which Caddy sits on top of. The domain is hosted in Route53 with an A record pointing to the IP address of the instance.
Here's what I currently have:
# gunicorn.service
[Unit]
Description=gunicorn daemon
Requires=gunicorn.socket
After=network.target
[Service]
User=root
Group=root
WorkingDirectory=/opt/app_repo
Restart=always
ExecStart=/opt/app_repo/venv/bin/gunicorn \
--access-logfile /opt/app_repo/gunicorn.access.log \
--error-logfile /opt/app_repo/gunicorn.error.log \
--timeout 600 \
--workers 5 \
--bind unix:/run/gunicorn.sock \
--log-level DEBUG \
--capture-output \
app_repo.wsgi:application
[Install]
WantedBy=multi-user.target
# gunicorn.socket
[Unit]
Description=gunicorn socket
[Socket]
ListenStream=/run/gunicorn.sock
[Install]
WantedBy=sockets.target
# Caddyfile
CADDY_SERVER_NAME {
@notStatic {
not {
path /staticfiles/*
}
}
handle_path /staticfiles/* {
file_server
root * /opt/app_repo/static/
}
reverse_proxy @notStatic unix//run/gunicorn.sock {
header_up Host {host}
}
log {
output file /opt/app_repo/caddy.access.log {
roll_size 1gb
roll_keep 5
roll_keep_for 720h
}
}
}
The problem is that the site is reported as unreachable by our monitoring tool (and confirmed by some clients as well) for 5-10 minutes everyday for no apparent pattern at all. Whenever I SSH back onto the server, the gunicorn and caddy service are up and running (checked via systemctl status
). Checking journalctl
doesn't yield any helpful details:
$ journalctl -u gunicorn --boot
Feb 14 18:27:50 ip-172-31-3-73 systemd[1]: Started gunicorn daemon.
Feb 15 13:02:26 ip-172-31-3-73 systemd[1]: Stopping gunicorn daemon...
Feb 15 13:02:26 ip-172-31-3-73 systemd[1]: gunicorn.service: Deactivated successfully.
Feb 15 13:02:26 ip-172-31-3-73 systemd[1]: Stopped gunicorn daemon.
Feb 15 13:02:26 ip-172-31-3-73 systemd[1]: gunicorn.service: Consumed 1h 15min 13.075s CPU time.
Feb 15 13:02:26 ip-172-31-3-73 systemd[1]: Started gunicorn daemon.
Feb 15 13:16:52 ip-172-31-3-73 systemd[1]: Stopping gunicorn daemon...
Feb 15 13:16:53 ip-172-31-3-73 systemd[1]: gunicorn.service: Deactivated successfully.
Feb 15 13:16:53 ip-172-31-3-73 systemd[1]: Stopped gunicorn daemon.
Feb 15 13:16:53 ip-172-31-3-73 systemd[1]: gunicorn.service: Consumed 39.035s CPU time.
Feb 15 13:16:53 ip-172-31-3-73 systemd[1]: Started gunicorn daemon.
$ journalctl -u gunicorn --boot
Feb 14 18:27:50 ip-172-31-3-73 systemd[1]: Started gunicorn daemon.
Feb 15 13:02:26 ip-172-31-3-73 systemd[1]: Stopping gunicorn daemon...
Feb 15 13:02:26 ip-172-31-3-73 systemd[1]: gunicorn.service: Deactivated successfully.
Feb 15 13:02:26 ip-172-31-3-73 systemd[1]: Stopped gunicorn daemon.
Feb 15 13:02:26 ip-172-31-3-73 systemd[1]: gunicorn.service: Consumed 1h 15min 13.075s CPU time.
Feb 15 13:02:26 ip-172-31-3-73 systemd[1]: Started gunicorn daemon.
Feb 15 13:16:52 ip-172-31-3-73 systemd[1]: Stopping gunicorn daemon...
Feb 15 13:16:53 ip-172-31-3-73 systemd[1]: gunicorn.service: Deactivated successfully.
Feb 15 13:16:53 ip-172-31-3-73 systemd[1]: Stopped gunicorn daemon.
Feb 15 13:16:53 ip-172-31-3-73 systemd[1]: gunicorn.service: Consumed 39.035s CPU time.
Feb 15 13:16:53 ip-172-31-3-73 systemd[1]: Started gunicorn daemon.
$ journalctl -u caddy --boot | grep "Feb 16" | grep "error"
Feb 16 03:10:09 ip-172-31-3-73 caddy[5328]: {"level":"error","ts":1676517009.8251915,"logger":"http.handlers.reverse_proxy","msg":"aborting with incomplete response","error":"http2: stream closed"}
grep
-ing dmesg
for gunicorn and caddy doesn't yield anything as well as far as I can tell.
$ dmesg | grep caddy
$ dmesg | grep gunicorn
[ 2.972213] systemd[1]: Configuration file /etc/systemd/system/gunicorn.socket is marked world-writable. Please remove world writability permission bits. Proceeding anyway.
[ 2.984758] systemd[1]: Configuration file /etc/systemd/system/gunicorn.service is marked world-writable. Please remove world writability permission bits. Proceeding anyway.
What file/log/service should I be looking at?