DOCKER – UBUNTO – HADDOP 2.9.1 – R 설치
-
2018-07-03 시작
-
은 root 명령
-
$ hduser 사용자 명령
DOCKER
설치
Windows7
- havdetectiontool.exe 로 컴퓨터 하드웨어 가상화 지원 여부 확인
- DockerToolbox.exe 로 설치
UBUNTU
설치
도커에서 설치
- docker pull ubuntu:latest
- 다른 버전 우분투 설치시는 버전 명시
- docker pull ubuntu:16.04
UBUNTU 접속 및 컨테이너 확인
- docker run -it –name ubuntu_hadoop ubuntu:latest
- docker ps -a
- 우분투 버전 확인 : # /ets/issue
필수 패키지 설치
- apt-get update (패키지 저장소 업데이트)
- 에디터 설치 : apt-get install vim nano
- wget 설치 : apt-get install wget
서버 인코딩 설정 : ko_KR.UTF-8
-
locale
-
apt-get install language-pack-ko
-
export LANGUAGE=ko_KR.UTF-8
-
export LANG=ko_KR.UTF-8
-
locale-gen ko_KR ko_KR.UTF-8
-
update-locale LANG=ko_KR.UTF-8
-
dpkg-reconfigure locales
자바 8 설치 (openjdk)
- 우분투 14버전의 PPA repository에는 자바 8이 기본적으로 포함되어있지않다고 함
- apt-get install openjdk-8-jdk
- apt-get install default-jdk (v18.04 java-11)
- update-alternatives –config java
- ln -s java-1.8.0-openjdk-amd64 java8
- /etc/profile 에 환경변수 등록
- vi /etc/profile
export JAVA_HOME=/usr/lib/jvm/java8
export PATH=$PATH:$JAVA_HOME/bin
export CLASS_PATH="."
- source /etc/profile
- java -version
하둡 계정 설정
- apt-get install sudo
- addgroup hadoop
- adduser –ingroup hadoop hduser
- adduser hduser sudo
- groups hduser
SSH 설치 및 설정
#
atp-get install ssh#
atp-get install openssh-server#
which ssh sshd#
su hduser- $ ssh-keygen -t rsa
- $ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
- $ sudo service ssh start
- $ ssh localhost
- yes 선택
- $ exit
프로토콜 버퍼 설치
- apt-get install autoconf automake libtool curl make g++ unzip
#
wget https://github.com/google/protobuf/releases/download/v2.5.0/protobuf-2.5.0.tar.gz#
./configure#
make#
make install#
ldconfig#
protoc –version
하둡(Hadoop)2 다운로드 및 압축 해제
- $ cd ~
- $ wget "http://mirror.apache-kr.org/hadoop/common/hadoop-2.9.1/hadoop-2.9.1.tar.gz"
- $ sudo mkdir /usr/local/hadoop
- $ sudo mv haoop* /usr/local/hadoop
- $ sudo chown -R hduser:hadoop /usr/local/hadoop
- $ tar xvfz hadoop-2.9.1.tar.gz
- $ ln -s hadoop-2.9.1 hadoop
하둡 환경설정 파일 수정
- ./etc/hadoop/hadoop-env.sh
export JAVA_HOME=/usr/lib/jvm/java8
export HADOOP_HOME_WARN_SUPPRESS="TRUE"
export HADOOP_PID_DIR=/usr/local/hadoop/hadoop/pids
- vim masters
localhost
- vim slaves
localhost
- core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9010</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/usr/local/hadoop/tmp</value>
</property>
</configuration>
- hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>/usr/local/hadoop/data/dfs/namenode</value>
</property>
<property>
<name>dfs.namenode.checkpoint.dir</name>
<value>/usr/local/hadoop/data/dfs/namesecondary</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>/usr/local/hadoop/data/dfs/datanode</value>
</property>
<property>
<name>dfs.http.address</name>
<value>localhost:50070</value>
</property>
<property>
<name>dfs.secondary.http.address</name>
<value>localhost:50090</value>
</property>
</configuration>
mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
- yarn-env.xml
/etc/profile 이나 ~/.bashrc 에 JAVA_HOME 있으면 설정 불필요
- yarn-site.xml
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.nodemanager.local-dirs</name>
<value>/usr/local/hadoop/data/yarn/nm-local-dir</value>
</property>
<property>
<name>yarn.resourcemanager.fs.state-store.uri</name>
<value>/usr/local/hadoop/data/yarn/system/rmstore</value>
</property>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>localhost</value>
</property>
<property>
<name>yarn.web-proxy.address</name>
<value>0.0.0.0:8089</value>
</property>
</configuration>
- 초기화 : ./bin/hdfs namenode – format
- 실행 : ./sbin/start-dfs.sh
- 실행 : ./sbin/start-yarn.sh
- 브라우저에서 확인 : apt-get install w3m
- w3m "http://localhost:50070"
하둡 예제 실행
- ./bin/hdfs dfs -mkdir /user
- ./bin/hdfs dfs -mkdir /user/hadoop
- ./bin/hdfs dfs -mkdir /user/hadoop/conf
- HDFS에 파일 업로드: ./bin/hdfs dfs -put etc/hadoop/hadoop-env.sh /user/hadoop/conf/
- jar 파일 실행 : ./bin/yarn jar share/hadoop/mapreduce/hadoop-mappreduce-exmaple-2.9.1.jar wordcount /user/hadoop/conf/ output
- HDFD에 저장된 출력 값 확인 : ./bin/hdfs dfs -cat output/part-r-00000 | tail -5
RHIPE 설치(R 과 하둡 연결)
환경 설정 정리
/etc/profile
export JAVA_HOME=/usr/lib/jvm/java8
export PATH=$PATH:$JAVA_HOME/bin
export CLASS_PATH="."
export PKG_CONFIG_PATH=/usr/local/lib
export LD_LIBRARY_PATH=/usr/local/lib
export HADOOP_LIBS=`hdfs classpath | tr -d '*'`
/root/.bashrc
export JAVA_HOME=/usr/lib/jvm/java8
export PATH=$PATH:$JAVA_HOME/bin
export CLASS_PATH="."
/home/hduser/.bashrc
export JAVA_HOME=/usr/lib/jvm/java8
export HADOOP_HOME=/usr/local/hadoop/haddop
export PATH=$PATH:$JAVA_HOME/bin
export PATH=$PATH:$HADOOP_HOME/bin
export HADOOP_LIBS=`hdfs classpath | tr -d '*'`메
- /etc/profile 에 모두 설정 하면 될 것 같은데..
/etr/R/Renviron에 환경변수 추가
- HADOOP_HOME=/usr/local/hadoop/hadoop
- HADOOP_BIN=/usr/local/hadoop/hadoop/bin
- HADOOP_CONF_DIR=/usr/local/hadoop/hadoop/conf
R 설치
#
apt-get install r-base
R에서 작업
- R CMD javareconf (자바 경로 재 설정)
- update.packages()
- install.package("rJava") 설치
- http://ririsdata.blogspot.com/2016/10/rubuntu-java-rjava.html
- install.package("testthat")
- wget http://ml.stat.purdue.edu/rhipebin/Rhipe_0.75.2_hadoop-2.tar.gz
- apt-get install pkg-config
- R CMD INSTALL Rhipe_0.74.0.tar.gz
- R CMD INSTALL Rhipe_0.75.2_hadoocdp-2.tar.gz
- library(Rhipe)
- rhinit()