Score:0

Hadoop datanodes Using "{Hostname}/{IP address}:9000" to try to connect to nameNode

ki flag

I have a cluster of Pis that I'm using to experiment with Hadoop. masternode is set to .190, p1 to 191 ... p4 to 194. All nodes are up and running. start-dfs.sh, stop-all.sh, etc from the master successfully start and stop the datanodes. However, on start, the datanodes cannot connect back to the master node. The datanodes are trying to use "hostname/ip_address:9000" to try and reconnect.

hadoop-hduser-datanode-p1.log reports:

INFO org.apache.hadoop.ipc.Client: Retrying connect to server: masternode/192.168.1.190:9000. Already tried 8 time(s);

master-node is set to 192.168.1.190 via reserved DNS by MAC address on my router. Same goes from the other nodes.

/etc/hosts is empty on the datanodes. Setting them doesn't change the behavior.

All the .xml files (like core-site.xml) uses "hdfs://masternode:port". None of them uses "masternode/ip address:port", so I'm not sure where the IP address is coming from.

    <property>
        <name>fs.default.name</name>
        <value>hdfs://masternode:9000/</value>
    </property>

workers file is just the name of the datanode servers:

workers" 4L, 12C                                                     1,1       All

p1
p2
p3
p4

Any ideas what is appending the IP address to the hostname?

diya avatar
la flag
I *think* this is a red herring and the log events simply include the IP-address the hostname resolves to as part of the log message and simply show `hostname/ip-address-of-hostname` `:port` and there is no configuration error in that regard. Your problem is probably something else.
Snap E Tom avatar
ki flag
You are correct, thank you. I changed the url in core-site.xml on the datanodes and indeed the lookups failed, and the IP address was no longer there. Your answer helped lead me down tracking the root cause of the real issue.
Score:0
ki flag

After diya pointed out that the IP address was merely a diagnostic, it was clear there was some sort of connection issue between the datanodes and namenodes on 9000.

I could ssh into the master from datanodes, but nc -zv masternode 9000 confirmed that the datanodes could not connect over 9000 to the masternode. netstat -lnt on the masternode confirmed 9000 was only bound to 127.0.0.1. This led me to this answer: https://stackoverflow.com/a/64611530/213017. I checked /etc/hosts and there was indeed an entry for 127.0.0.1 masternode. Removed that and the datanodes were able to connect.

Yuri Dolotkazin avatar
kr flag
Unfortunately, I'm facing the same issue. But in my case, namenode indeed listens on all interfaces. More than that I'm able to connect from the second host to the namenode via telnet. netstat shows an established connection and if I type, "abc" it gives a response: "org.apache.hadoop.ipc.RPC$VersionMismatch*>Server IPC version 9 cannot communicate with client version 100:@Connection closed by foreign host." So no network issue here. But the datanode still can't connect.
mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.