참고사이트
싱글노드
- http://scalefreeus.tistory.com/21
클러스터
- Hadoop Cluster & Spark 설치하기 - 1.서버 환경구성
- Hadoop Cluster & Spark 설치하기 - 2.Hadoop 설치
- Hadoop Cluster & Spark 설치하기 - 3.Spark 설치
- Hadoop 2.7.3 설치 (Fully Distributed Mode)
- 아무것도 몰라도 설치할 수 있는 hadoop 2.7.3 설치 가이드
- 하둡 설정 방법( How to configure Hadoop 2.X )
하둡명령어
hadoop 서버 환경 설정
#OS 버전
ubuntu 14.05
#계정 및 접속정보
enleaf:root
맥국!@1
hduser:hadoop
하둡!@1
#hostname & hosts
127.0.0.1 localhost
192.168.0.80 hdmaster
192.168.0.81 hdslave1
192.168.0.82 hdslave2
192.168.0.83 hdslave3
[나머지 내용 삭제]
----------------------------------------------------------------------
3. Java 설치
sudo wget --header "Cookie: oraclelicense=accept-securebackup-cookie" "http://download.oracle.com/otn-pub/java/jdk/8u111-b14/jdk-8u111-linux-x64.tar.gz"
/opt/jdk/jdk1.8.0.111
# sudo vi ~/.bashrc
#JAVA
export JAVA_HOME=/opt/jdk/jdk1.8.0_111
export PATH=$JAVA_HOME/bin:$PATH
export CLASS_PATH=$JAVA_HOME/lib:$CLASS_PATH
#source ~/.bashrc
SSH RSA 연동 이후
scp -r ~/.bashrc hdslave1:~/.bashrc
scp -r ~/.bashrc hdslave2:~/.bashrc
scp -r ~/.bashrc hdslave3:~/.bashrc
----------------------------------------------------------------------
@SSH
내부 : 192.168.0.80:22
외부 : enleaf.iptime.org:13322
#Master에서 RSA 생성 후 Slave로 copy
ssh-keygen -t rsa -P ""
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
ssh-copy-id -i ~/.ssh/id_rsa.pub hdslave1
ssh-copy-id -i ~/.ssh/id_rsa.pub hdslave2
ssh-copy-id -i ~/.ssh/id_rsa.pub hdslave3
#Slave에서 RSA 생성 후 Master로 copy
ssh-keygen -t rsa -P ""
ssh-copy-id -i ~/.ssh/id_rsa.pub hdmaster
마스터 뿐만 아니라 각각 슬레이브 서버에서 SSH RSA키를 생성하고 마스터로 카피해줘야 마스터<->슬레이브 간에 트러스트가 성립된다
*한쪽만 하면 한방향만 가능함
----------------------------------------------------------------------
@Hadoop 2.7.3
wget http://www.us.apache.org/dist/hadoop/common/hadoop-2.7.3/hadoop-2.7.3.tar.gz
tar -zxf hadoop-2.7.3.tar.gz
mv hadoop-2.7.3 hadoop
/home/hduser/hadoop
----------------------------------------------------------------------
$sudo vi ~/.bashrc
#HADOOP
export HADOOP_HOME=/home/hduser/hadoop
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH
export HADOOP_CLASSPATH=${JAVA_HOME}/lib/tools.jar
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export YARN_CONF_DIR=$HADOOP_HOME/etc/hadoop
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/hadoop/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/hadoop/lib/native"
#OS버전이 64bit인경우 native를 붙여주어야 함
#32bit:$HADOOP_HOME/hadoop/lib
#64bit:$HADOOP_HOME/hadoop/lib/native
$source ~/.bashr
#SPARK
export SPARK_HOME=/usr/local/spark
export SPARK_SUBMIT=$SPARK_HOME/bin/spark-submit
export PATH=$PATH:$SPARK_HOME/bin
scp -r /home/hduser/.bashrc hduser@hdslave1/home/hduser/hadoop
scp -r /home/hduser/hadoop hduser@hdslave1/home/hduser/hadoop
scp -r /home/hduser/hadoop hduser@hdslave2:/home/hduser/hadoop
scp -r /home/hduser/hadoop hduser@hdslave3:/home/hduser/hadoop
rsync -av /home/hduser/hadoop/etc/hadoop vm-hadoop-slave2:/home/hduser/hadoop/etc
rsync -av /home/hduser/hadoop/etc/hadoop vm-hadoop-slave3:/home/hduser/hadoop/etc
rsync -av /home/hduser/hadoop/etc/hadoop hdslave1:/home/hduser/hadoop/etc
rsync -av /home/hduser/hadoop/etc/hadoop hdslave2:/home/hduser/hadoop/etc
rsync -av /home/hduser/hadoop/etc/hadoop hdslave3:/home/hduser/hadoop/etc
------------------------------------------------------------------
WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
이 에러는 64비트 리눅스에서 32비트 하둡을 돌려서 생긴다고 합니다.
hadoop-env.sh 나 .bashrc나, 어디에든 다음을 추가해주면 해결됩니다.
원래는 $HADOOP_HOME/lib 으로 되어 있는 부분을 $HADOOP_HOME/lib/native 로 바꾸면 됩니다.
원래는 export HADOOP_OPTS="$HADOOP_OPTS -Djava.library.path=$HADOOP_PREFIX/lib" 이었던 것을 export HADOOP_OPTS="$HADOOP_OPTS -Djava.library.path=$HADOOP_PREFIX/lib/native" 로 바꾸시면 됩니다.
-------------------------------------------------------------
$ cd ~/hadoop/etc/hadoop/
### Configuration for the Master
$ vi core-site.xml
----------------------------------------------------------------------------------
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://hdmaster:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hduser/hadoop/hdfs/tmp</value>
</property>
</configuration>
### hdfs 환경 설정 (디폴트 복제는 3이지만 2만 해줍니다.)
$ vi hdfs-site.xml
----------------------------------------------------------------------------------
<configuration>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<property>
<name>dfs.name.dir</name>
<value>/home/hduser/hadoop/hdfs/name</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>/home/hduser/hadoop/hdfs/data</value>
</property>
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>hdslave1:50090</value>
</property>
<property>
<name>dfs.namenode.secondary.https-address</name>
<value>hdslave1:50091</value>
</property>
</configuration>
### yarn 환경 설정 (memory와 core수는 각자의 서버에 적절한 값으로 설정해줍니다.)
$ vi yarn-site.xml
----------------------------------------------------------------------------------
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>hdmaster:8025</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>hdmaster:8030</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>hdmaster:8040</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>hdmaster:8088</value>
</property>
</configuration>
<!-- 여기 내용은 등록하지 않았음 -->
<configuration>
<!-- YARN master hostname -->
<property>
<name>yarn.resourcemanager.hostname</name>
<value>ubuntu0</value>
</property>
<!-- YARN settings for lower and upper resource limits -->
<property>
<name>yarn.scheduler.minimum-allocation-mb</name>
<value>512</value>
</property>
<property>
<name>yarn.scheduler.maximum-allocation-mb</name>
<value>4096</value>
</property>
<property>
<name>yarn.scheduler.minimum-allocation-vcores</name>
<value>1</value>
</property>
<property>
<name>yarn.scheduler.maximum-allocation-vcores</name>
<value>2</value>
</property>
<!-- Log aggregation settings -->
<property>
<name>yarn.log-aggregation-enable</name>
<value>true</value>
</property>
<property>
<name>yarn.log-aggregation.retain-seconds</name>
<value>86400</value>
<description>How long to keep aggregation logs. Used by History Server.</description>
</property>
</configuration>
<!-- 여기 내용은 등록하지 않았음 -->
### mapreduce 환경 설정 (memory 값은 각자의 서버에 적절하게 지정해주도록 합니다.)
$ vi mapred-site.xml
----------------------------------------------------------------------------------
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<!-- History Server settings -->
<property>
<name>mapreduce.jobhistory.address</name>
<value>hdmaster:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>hdmaster:19888</value>
</property>
</configuration>
<!-- 여기 내용은 등록하지 않았음 -->
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<!-- MapReduce ApplicationMaster properties -->
<property>
<name>yarn.app.mapreduce.am.resource.mb</name>
<value>1536</value>
</property>
<property>
<name>yarn.app.mapreduce.am.command-opts</name>
<value>-Xmx1536m</value>
</property>
<!-- Mappers and Reducers settings -->
<property>
<name>mapreduce.map.memory.mb</name>
<value>2048</value>
</property>
<property>
<name>mapreduce.map.cpu.vcores</name>
<value>1</value>
</property>
<property>
<name>mapreduce.reduce.memory.mb</name>
<value>4096</value>
</property>
<property>
<name>mapreduce.reduce.cpu.vcores</name>
<value>1</value>
</property>
<property>
<name>mapreduce.job.reduces</name>
<value>2</value>
</property>
<!-- History Server settings -->
<property>
<name>mapreduce.jobhistory.address</name>
<value>ubuntu0:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>ubuntu0:19888</value>
</property>
</configuration>
<!-- 여기 내용은 등록하지 않았음 -->
### 마스터 서버에만 slave 서버들을 등록해줍니다.
$ vi slaves
----------------------------------------------------------------------------------
hdslave1
hdslave2
hdslave3
각 slave 서버로 배포
scp -r /home/hduser/hadoop hdslave1:~
scp -r /home/hduser/hadoop hdslave2:~
scp -r /home/hduser/hadoop hdslave3:~
c -av /home/hduser/hadoop/etc/hadoop/ hdslave2:/home/hduser/hadoop/etc
rsync -av /home/hduser/hadoop/etc/hadoop/ hdslave3:/home/hduser/hadoop/etc
hadoop namenode -format
start-dfs.sh
stop-dfs.sh
WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
export HADOOP_COMMON_LIB_NATIVE_DIR=${HADOOP_PREFIX}/lib/native
export HADOOP_OPTS="${HADOOP_OPTS} -Djava.library.path=$HADOOP_PREFIX/lib/native"
2017-02-09 20:21:13,027 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Problem connecting to server: hdmaster/192.168.0.80:9000
2017-02-09 20:21:19,029 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: hdmaster/192.168.0.80:9000