docker ubuntu hadoop
DOCKER – UBUNTO – HADDOP 2.9.1 – R 설치
# 은 root 명령
$ hduser 사용자 명령
DOCKER
설치
- OS : Windows7
- havdetectiontool.exe 로 컴퓨터 하드웨어 가상화 지원 여부 확인
- DockerToolbox.exe 로 설치
UBUNTU 설치
도커에서 설치
- docker pull ubuntu:latest
- 다른 버전 우분투 설치시는 버전 명시
- docker pull ubuntu:16.04
UBUNTU 접속 및 컨테이너 확인
- docker run -it –name ubuntu_hadoop ubuntu:latest
- docker ps -a
- 우분투 버전 확인 : # /ets/issue
필수 패키지 설치
자바 8 설치 (openjdk)
- 우분투 14버전의 PPA repository에는 자바 8이 기본적으로 포함되어있지않다고 함
- apt-get install openjdk-8-jdk
- apt-get install default-jdk (v18.04 java-11)
- update-alternatives –config java
- ln -s java-1.8.0-openjdk-amd64 java8
- /etc/profile 에 환경변수 등록
- vi /etc/profile
export JAVA_HOME=/usr/lib/jvm/java8
export PATH=$PATH:$JAVA_HOME/bin
export CLASS_PATH="."
```
- source /etc/profile
- java -version
##### 하둡 계정 설정
- apt-get install sudo
- addgroup hadoop
- adduser –ingroup hadoop hduser
- adduser hduser sudo
- groups hduser
##### SSH 설치 및 설정
- # atp-get install ssh
- # atp-get install openssh-server
- # which ssh sshd
- # su hduser
- $ ssh-keygen -t rsa
- $ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
- $ sudo service ssh start
- $ ssh localhost
- yes 선택
- $ exit
##### 프로토콜 버퍼 설치
- apt-get install autoconf automake libtool curl make g++ unzip
- # wget https://github.com/google/protobuf/releases/download/v2.5.0/protobuf-2.5.0.tar.gz
- # ./configure
- # make
- # make install
- # ldconfig
- # protoc –version
##### 하둡(Hadoop)2 다운로드 및 압축 해제
- $ cd ~
- $ wget "http://mirror.apache-kr.org/hadoop/common/hadoop-2.9.1/hadoop-2.9.1.tar.gz"
- $ sudo mkdir /usr/local/hadoop
- $ sudo mv haoop* /usr/local/hadoop
- $ sudo chown -R hduser:hadoop /usr/local/hadoop
- $ tar xvfz hadoop-2.9.1.tar.gz
- $ ln -s hadoop-2.9.1 hadoop
##### 하둡 환경설정 파일 수정
- ./etc/hadoop/hadoop-env.sh
```
export JAVA_HOME=/usr/lib/jvm/java8
export HADOOP_HOME_WARN_SUPPRESS="TRUE"
export HADOOP_PID_DIR=/usr/local/hadoop/hadoop/pids
```
- vim masters
```
localhost
```
- vim slaves
```
localhost
```
- core-site.xml
```
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9010</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/usr/local/hadoop/tmp</value>
</property>
</configuration>
hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>/usr/local/hadoop/data/dfs/namenode</value>
</property>
<property>
<name>dfs.namenode.checkpoint.dir</name>
<value>/usr/local/hadoop/data/dfs/namesecondary</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>/usr/local/hadoop/data/dfs/datanode</value>
</property>
<property>
<name>dfs.http.address</name>
<value>localhost:50070</value>
</property>
<property>
<name>dfs.secondary.http.address</name>
<value>localhost:50090</value>
</property>
</configuration>
```
- mapred-site.xml
```
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
```
- yarn-env.xml
- /etc/profile 이나 ~/.bashrc 에 JAVA_HOME 있으면 설정 불필요
- yarn-site.xml
```
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.nodemanager.local-dirs</name>
<value>/usr/local/hadoop/data/yarn/nm-local-dir</value>
</property>
<property>
<name>yarn.resourcemanager.fs.state-store.uri</name>
<value>/usr/local/hadoop/data/yarn/system/rmstore</value>
</property>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>localhost</value>
</property>
<property>
<name>yarn.web-proxy.address</name>
<value>0.0.0.0:8089</value>
</property>
</configuration>
```
- 초기화 : ./bin/hdfs namenode – format
- 실행 : ./sbin/start-dfs.sh
- 실행 : ./sbin/start-yarn.sh
- 브라우저에서 확인 : apt-get install w3m
- w3m "http://localhost:50070"
##### 하둡 예제 실행
- ./bin/hdfs dfs -mkdir /user
- ./bin/hdfs dfs -mkdir /user/hadoop
- ./bin/hdfs dfs -mkdir /user/hadoop/conf
- HDFS에 파일 업로드: ./bin/hdfs dfs -put etc/hadoop/hadoop-env.sh /user/hadoop/conf/
- jar 파일 실행 : ./bin/yarn jar share/hadoop/mapreduce/hadoop-mappreduce-exmaple-2.9.1.jar wordcount - /user/hadoop/conf/ output
- HDFD에 저장된 출력 값 확인 : ./bin/hdfs dfs -cat output/part-r-00000 | tail -5
##### RHIPE 설치(R 과 하둡 연결)
- 환경 설정 정리
- /etc/profile
```
export JAVA_HOME=/usr/lib/jvm/java8
export PATH=$PATH:$JAVA_HOME/bin
export CLASS_PATH="."
export PKG_CONFIG_PATH=/usr/local/lib
export LD_LIBRARY_PATH=/usr/local/lib
export HADOOP_LIBS=`hdfs classpath | tr -d '*'`
/root/.bashrc
export JAVA_HOME=/usr/lib/jvm/java8
export PATH=$PATH:$JAVA_HOME/bin
export CLASS_PATH="."
```
- /home/hduser/.bashrc
```
export JAVA_HOME=/usr/lib/jvm/java8
export HADOOP_HOME=/usr/local/hadoop/haddop
export PATH=$PATH:$JAVA_HOME/bin
export PATH=$PATH:$HADOOP_HOME/bin
export HADOOP_LIBS=`hdfs classpath | tr -d '*'`
```
- /etc/profile 에 모두 설정 하면 될 것 같은데..
- /etr/R/Renviron에 환경변수 추가
```
HADOOP_HOME=/usr/local/hadoop/hadoop
HADOOP_BIN=/usr/local/hadoop/hadoop/bin
HADOOP_CONF_DIR=/usr/local/hadoop/hadoop/conf
```
- R 설치
- # apt-get install r-base
##### R에서 작업
- R CMD javareconf (자바 경로 재 설정)
- update.packages()
- install.package("rJava") 설치
- http://ririsdata.blogspot.com/2016/10/rubuntu-java-rjava.html
- install.package("testthat")
- wget http://ml.stat.purdue.edu/rhipebin/Rhipe_0.75.2_hadoop-2.tar.gz
- apt-get install pkg-config
- R CMD INSTALL Rhipe_0.74.0.tar.gz
- R CMD INSTALL Rhipe_0.75.2_hadoocdp-2.tar.gz
- library(Rhipe)
- rhinit()