I have a bizarre connection issue with just one server that I have no idea how to diagnose. Basically I'm setting up a self-hosted email service using iRedMail and everything works fine for a while and then:
- all http, https and imap connections stop working (error is either timeout or connection refused)
- when it fails from one client machine, it starts failing for all other machines on the same local network (which is remote from the server)
- when in the failed state, all other sites work fine
- if I reboot the local router (which also reestablishes connection to ISP and gets a new IP address) everything starts working again.
- if I switch my phone from wifi (the failed network) to 4G it starts working on the phone. Switch back to wifi it fails again.
- wget from the remote mail server to itself, works
- wget from a different remote server to the mail server works
- rebooting the server machine doesn't fix the problem
- if I don't reset the local router, after about 8 hours it starts working again
- I've reinstalled the server OS and software multiple times
- I've also seen it fail over 4G, but it started working again after about an hour
- When in failed state, the server nginx access and error logs don't show any activity
- Connecting to the server from Outlook for iOS seems to break it almost immediately
- The problem doesn't affect SSH or PING
- DNS lookups are working fine
- The server is a 4GB shared Linode, with Ubuntu 20.04 LTS and latest iRedMail.
I'd contact my ISP but as mentioned I've also seen it (just once) via 4G (ie: different ISP). It's like there's some malformed data returned from the server which breaks something in the pipe between here and the server.
Basically I'm stumped so any clues would be greatly appreciated. How do I diagnose this? What else can I try?
Update: I've since re-created the exact same Linode in a different data center (Sydney, same city where I'm located) and all these issues seem to have vanished. Weird.