I'm a little curious and confused about this situation. We setup a monitoring instance scraping an exposed endpoint on 2 different instances. Both on same VPC, same security group, same route table and ACL. Also, both instances uses the same AMI. For some reason the tcp communication on port 5001 doesn't work on the machine that's using the subnet 10.0.1.0 , but it works on the using the subnet 10.0.0.0. The instance that's working is also on the same AZ of the monitoring machine (us-east-1a), the one that doesn't work is on (us-east-1b).
After a lot of tcpdump and troubleshooting, cause other ports works like, 80, 443, 4001. I've decided to create an AMI of the instance on 10.0.0.0 and deploy a new machine on the same subnet and AZ. Surprising that worked, now I have 3 machines, 2 of them on the same subnet sending metrics over 5001, and the other one returning timeout.
Is this something related to public IP's? Account limitation?
Thanks, I'm a little worried with this
Edit:
I've done what Tim told in the reply. I've created an AMI of the working instance that's on 10.0.0.0 and deployed it on 10.0.1.0, and it worked. So just to be clear.
Both AMI from the 2 machines worked on the subnets. I'll call 10.0.0.0 as subnetA and 10.0.1.0 as subnetB. The AMI from B was deployed on A and it worked. The AMI from A was deployed on B and it worked as well.. I'm a little confused.
BTW: Those machines were created by terraform long time ago, now we are using Pulumi,maybe something happened during the terraform apply and no one saw it.