I have got the following setup for a local Hive-Server with Hadoop:
version: "3"
services:
namenode:
image: bde2020/hadoop-namenode:2.0.0-hadoop3.2.1-java8
container_name: namenode
restart: always
ports:
- 9870:9870
- 9000:9000
volumes:
- ./hdfs/namenode:/hadoop/dfs/name
environment:
- CLUSTER_NAME=test
env_file:
- ./hadoop.env
datanode:
image: bde2020/hadoop-datanode:2.0.0-hadoop3.2.1-java8
container_name: datanode
restart: always
volumes:
- ./hdfs/datanode:/hadoop/dfs/data
environment:
SERVICE_PRECONDITION: "namenode:9870"
env_file:
- ./hadoop.env
hive-server:
image: bde2020/hive:2.3.2-postgresql-metastore
container_name: hive-server
volumes:
- ./employee:/employee
env_file:
- ./hadoop-hive.env
environment:
HIVE_CORE_CONF_javax_jdo_option_ConnectionURL: "jdbc:postgresql://hive-metastore/metastore"
SERVICE_PRECONDITION: "hive-metastore:9083"
ports:
- "10000:10000"
hive-metastore:
image: bde2020/hive:2.3.2-postgresql-metastore
env_file:
- ./hadoop-hive.env
command: /opt/hive/bin/hive --service metastore
environment:
SERVICE_PRECONDITION: "namenode:9000 namenode:9870 hive-metastore-postgresql:5432"
ports:
- "9083:9083"
hive-metastore-postgresql:
image: bde2020/hive-metastore-postgresql:2.3.0
presto-coordinator:
image: shawnzhu/prestodb:0.181
ports:
- "8080:8080"
I start everthing with docker-compose and it works fine
. I enter the hive-server
container.
docker exec -it hive-server /bin/bash
I run hive -f employee_table.sql
to create a schema in Hive.
Then I store a little .csv in Hadoop:
hadoop fs -put employee.csv hdfs://namenode:9000/user/hive/warehouse/testdb.db/employee
This also works and after I run docker-compose down
I restart the services and all data I inserted before is gone. I don´t really understand this, I can even see the files in the following subdirectory:
hdfs\datanode\current\BP-267128047-172.27.0.7-1633966854402\current\finalized\subdir0\subdir0
What am I doing wrong here? Is something wrong with my volumes? Ports?