Hadoop 2.0:分布式環(huán)境搭建安裝配置
集群環(huán)境:
1 NameNode(真實(shí)主機(jī)):
Linux yan-Server 3.4.36-gentoo #3 SMP Mon Apr 1 14:09:12 CST 2013 x86_64 AMD Athlon(tm) X4 750K Quad Core Processor AuthenticAMD GNU/Linux
2 DataNode1(虛擬機(jī)):
Linux node1 3.5.0-23-generic #35~precise1-Ubuntu SMP Fri Jan 25 17:13:26 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux
3 DataNode2(虛擬機(jī)):
Linux node2 3.5.0-23-generic #35~precise1-Ubuntu SMP Fri Jan 25 17:13:26 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux
4 DataNode3(虛擬機(jī)):
Linux node3 3.5.0-23-generic #35~precise1-Ubuntu SMP Fri Jan 25 17:13:26 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux
1.安裝VirtualBox虛擬機(jī)
Gentoo下直接命令編譯安裝,或者官網(wǎng)下載二進(jìn)制安裝包直接安裝:
emerge -av virtualbox
2.虛擬機(jī)下安裝Ubuntu 12.04 LTS
使用Ubuntu鏡像安裝完成后,然后再克隆另外兩臺(tái)虛擬主機(jī)(這里會(huì)遇到克隆的主機(jī)啟動(dòng)的時(shí)候主機(jī)名和MAC地址會(huì)是一樣的,局域網(wǎng)會(huì)造成沖突)
主機(jī)名修改文件
/etc/hostname
MAC地址修改需要先刪除文件
/etc/udev/rules.d/70-persistent-net.rules
然后在啟動(dòng)之前設(shè)置VirtualBox虛擬機(jī)的MAC地址
啟動(dòng)后會(huì)自動(dòng)生成刪除的文件,配置網(wǎng)卡的MAC地址。
為了更方便的在各主機(jī)之間共享文件,可以啟動(dòng)主機(jī)yan-Server的NFS,將命令加入/etc/rc.local中,讓客戶端自動(dòng)掛載NFS目錄。
刪除各虛擬機(jī)的NetworkManager,手動(dòng)設(shè)置靜態(tài)的IP地址,例如node2主機(jī)的/etc/network/interfaces文件配置如下:
- auto lo
- iface lo inet loopback
- auto eth0
- iface eth0 inet static
- address 192.168.137.202
- gateway 192.168.137.1
- netmask 255.255.255.0
- network 192.168.137.0
- broadcast 192.168.137.255
主機(jī)的基本環(huán)境設(shè)置完畢,下面是主機(jī)對(duì)應(yīng)的IP地址
類型 |
主機(jī)名 |
IP |
NameNode |
yan-Server |
192.168.137.100 |
DataNode |
node1 |
192.168.137.201 |
DataNode |
node2 |
192.168.137.202 |
DataNode |
node3 |
192.168.137.203 |
為了節(jié)省資源,可以設(shè)置虛擬機(jī)默認(rèn)啟動(dòng)字符界面,然后通過主機(jī)的TERMINAL ssh遠(yuǎn)程登錄。(SSH已經(jīng)啟動(dòng)服務(wù),允許遠(yuǎn)程登錄,安裝方法不再贅述)
設(shè)置方式是修改/etc/default/grub文件將下面的一行解除注釋
GRUB_TERMINAL=console
然后update-grub即可。
3.Hadoop環(huán)境的配置
3.1配置JDK環(huán)境(之前就做好了,這里不再贅述)
3.2在官網(wǎng)下載Hadoop,然后解壓到/opt/目錄下面(這里使用的是hadoop-2.0.4-alpha)
然后進(jìn)入目錄/opt/hadoop-2.0.4-alpha/etc/hadoop,配置hadoop文件
修改文件hadoop-env.sh
- export HADOOP_FREFIX=/opt/hadoop-2.0.4-alpha
- export HADOOP_COMMON_HOME=${HADOOP_FREFIX}
- export HADOOP_HDFS_HOME=${HADOOP_FREFIX}
- export PATH=$PATH:$HADOOP_FREFIX/bin
- export PATH=$PATH:$HADOOP_FREFIX/sbin
- export HADOOP_MAPRED_HOME=${HADOOP_FREFIX}
- export YARN_HOME=${HADOOP_FREFIX}
- export HADOOP_CONF_HOME=${HADOOP_FREFIX}/etc/hadoop
- export YARN_CONF_DIR=${HADOOP_FREFIX}/etc/hadoop
- export JAVA_HOME=/opt/jdk1.7.0_21
修改文件hdfs-site.xml
- <configuration>
- <property>
- <name>dfs.namenode.name.dir</name>
- <value>file:/opt/hadoop-2.0.4-alpha/workspace/name</value>
- <description>Determines where on the local filesystem the DFS name node should store the
- name table.If this is a comma-delimited list of directories,then name table is
- replicated in all of the directories,for redundancy.</description>
- <final>true</final>
- </property>
- <property>
- <name>dfs.datanode.data.dir</name>
- <value>file:/opt/hadoop-2.0.4-alpha/workspace/data</value>
- <description>Determines where on the local filesystem an DFS data node should
- store its blocks.If this is a comma-delimited list of directories,then data will
- be stored in all named directories,typically on different devices.Directories that do not exist are ignored.
- </description>
- <final>true</final>
- </property>
- <property>
- <name>dfs.replication</name>
- <value>1</value>
- </property>
- <property>
- <name>dfs.permission</name>
- <value>false</value>
- </property>
- </configuration>
修改文件mapred-site.xml
- <configuration>
- <property>
- <name>mapreduce.framework.name</name>
- <value>yarn</value>
- </property>
- <property>
- <name>mapreduce.job.tracker</name>
- <value>hdfs://yan-Server:9001</value>
- <final>true</final>
- </property>
- <property>
- <name>mapreduce.map.memory.mb</name>
- <value>1536</value>
- </property>
- <property>
- <name>mapreduce.map.java.opts</name>
- <value>-Xmx1024M</value>
- </property>
- <property>
- <name>mapreduce.reduce.memory.mb</name>
- <value>3072</value>
- </property>
- <property>
- <name>mapreduce.reduce.java.opts</name>
- <value>-Xmx2560M</value>
- </property>
- <property>
- <name>mapreduce.task.io.sort.mb</name>
- <value>512</value>
- </property>
- <property>
- <name>mapreduce.task.io.sort.factor</name>
- <value>100</value>
- </property>
- <property>
- <name>mapreduce.reduce.shuffle.parallelcopies</name>
- <value>50</value>
- </property>
- <property>
- <name>mapred.system.dir</name>
- <value>file:/opt/hadoop-2.0.4-alpha/workspace/systemdir</value>
- <final>true</final>
- </property>
- <property>
- <name>mapred.local.dir</name>
- <value>file:/opt/hadoop-2.0.4-alpha/workspace/localdir</value>
- <final>true</final>
- </property>
- </configuration>
修改文件yarn-env.xml
- export HADOOP_FREFIX=/opt/hadoop-2.0.4-alpha
- export HADOOP_COMMON_HOME=${HADOOP_FREFIX}
- export HADOOP_HDFS_HOME=${HADOOP_FREFIX}
- export PATH=$PATH:$HADOOP_FREFIX/bin
- export PATH=$PATH:$HADOOP_FREFIX/sbin
- export HADOOP_MAPRED_HOME=${HADOOP_FREFIX}
- export YARN_HOME=${HADOOP_FREFIX}
- export HADOOP_CONF_HOME=${HADOOP_FREFIX}/etc/hadoop
- export YARN_CONF_DIR=${HADOOP_FREFIX}/etc/hadoop
- export JAVA_HOME=/opt/jdk1.7.0_21
修改文件yarn-site.xml
- <configuration>
- <property>
- <name>yarn.resourcemanager.address</name>
- <value>yan-Server:8080</value>
- </property>
- <property>
- <name>yarn.resourcemanager.scheduler.address</name>
- <value>yan-Server:8081</value>
- </property>
- <property>
- <name>yarn.resourcemanager.resource-tracker.address</name>
- <value>yan-Server:8082</value>
- </property>
- <property>
- <name>yarn.nodemanager.aux-services</name>
- <value>mapreduce.shuffle</value>
- </property>
- <property>
- <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
- <value>org.apache.hadoop.mapred.ShuffleHandler</value>
- </property>
- </configuration>
將配置好的Hadoop復(fù)制到各DataNode(這里DataNode的JDK配置和主機(jī)的配置是一致的,不需要再修改JDK的配置)
3.3 修改主機(jī)的/etc/hosts,將NameNode加入該文件
192.168.137.100 yan-Server
192.168.137.201 node1
192.168.137.202 node2
192.168.137.203 node3
3.4 修改各DataNode的/etc/hosts文件,也添加上述的內(nèi)容
192.168.137.100 yan-Server
192.168.137.201 node1
192.168.137.202 node2
192.168.137.203 node3
3.5 配置SSH免密碼登錄(所有的主機(jī)都使用root用戶登錄)
主機(jī)上運(yùn)行命令
ssh-kengen -t rsa
一路回車,然后復(fù)制.ssh/id_rsa.pub為各DataNode的root用戶目錄.ssh/authorized_keys文件
然后在主機(jī)上遠(yuǎn)程登錄一次
ssh root@node1
***登錄可能會(huì)需要輸入密碼,之后就不再需要。(其他的DataNode也都遠(yuǎn)程登錄一次確??梢悦廨斎朊艽a登錄)
4.啟動(dòng)Hadoop
為了方便,在主機(jī)的/etc/profile配置hadoop的環(huán)境變量,如下:
- export HADOOP_PREFIX="/opt/hadoop-2.0.4-alpha"
- export PATH=$PATH:$HADOOP_PREFIX/bin
- export PATH=$PATH:$HADOOP_PREFIX/sbin
- export HADOOP_MAPRED_HOME=${HADOOP_PREFIX}
- export HADOOP_COMMON_HOME=${HADOOP_PREFIX}
- export HADOOP_HDFS_HOME=${HADOOP_PREFIX}
- export YARN_HOME=${HADOOP_PREFIX}
4.1 格式化NameNode
hdfs namenode -format
4.2 啟動(dòng)全部進(jìn)程
start-all.sh
在瀏覽器查看,地址:
所有數(shù)據(jù)節(jié)點(diǎn)DataNode正常啟動(dòng)。
4.3 關(guān)閉所有進(jìn)程
stop-all.sh
至此,Hadoop環(huán)境搭建基本結(jié)束。
原文鏈接:http://cloud.riaos.com/?p=8001977
【編輯推薦】