Setup a 4 node Hadoop cluster (1 master, 3 workers) on both AWS and GCP. However experiencing, quite high Network egress for both platforms.
AWS cluster apps: Hadoop, Yarn
GCP cluster apps: Hadoop, Yarn, Hive
AWS resulted to a 244.027GB($21.96). This was 'pardoned' after explanation to AWS support. However, no info on the traffic to prevent future occurrence was provided. Hence, since there are no credits on AWS, had to put the cluster down.
GCP: same issue, but at least with credit limits.
Probably related: have received 'potential violation of service' due to DDOS attacks from both AWS and GCP. Recently, received it from GCP while setting up Kerberos on the cluster.
So far:
- Configure nodes to talk to each other using internal-ips (previously was external-ips).
- Firewall rules only for relevant ports.
- Close all UI browser tabs to apps (Hive, HDFS, Yarn) when not in use.
- Requested for AWS support for assistance on best practices and info on traffic. Received a lot of links on AWS material mostly on setting up billing alerts (not configuration or troubleshooting).
- GCP support very helpful. GCP billing is straightforward. Requested Tech support via chat - pending.
Any help on how to track where traffic is from.
Update:
While working on only two of the nodes setting up Kerberos, seems I consumed up $100 of my remaining credits (on egress again) and cannot access my project unless upgrading to a full account.