Wụnye ụyọkọ Hadoop Multinode site na iji CDH4 na RHEL/CentOS 6.5


Hadoop bụ usoro mmemme mepere emepe nke apache mepụtara iji hazie nnukwu data. Ọ na-eji HDFS (Hadoop Distributed File System) iji chekwaa data n'ofe datanodes niile dị na ụyọkọ ahụ n'ụzọ nkesa yana mapreduce ụdị iji hazie data ahụ.

Namenode (NN) bụ nna ukwu daemon nke na-achịkwa HDFS na Jobtracker (JT) bụ master daemon maka mapreduce engine.

N'ime nkuzi a, m na-eji CentOS 6.3 VM 'master' na 'node' viz. (nna ukwu na ọnụ bụ aha nnabata m). The 'nna ukwu' IP bụ 172.21.17.175 na ọnụ IP bụ '172.21.17.188'. Ntuziaka ndị a na-arụkwa ọrụ na ụdị RHEL/CentOS 6.x.

 hostname

master
 ifconfig|grep 'inet addr'|head -1

inet addr:172.21.17.175  Bcast:172.21.19.255  Mask:255.255.252.0
 hostname

node
 ifconfig|grep 'inet addr'|head -1

inet addr:172.21.17.188  Bcast:172.21.19.255  Mask:255.255.252.0

Buru ụzọ hụ na ndị ụsụụ ụyọkọ niile nọ na faịlụ '/etc/hosts' (na ọnụ nke ọ bụla), ma ọ bụrụ na ịnweghị ntọala DNS.

 cat /etc/hosts

172.21.17.175 master
172.21.17.188 node
 cat /etc/hosts

172.21.17.197 qabox
172.21.17.176 ansible-ground

Ịwụnye Hadoop Multinode ụyọkọ na CentOS

Anyị na-eji ebe nchekwa CDH gọọmentị iji wụnye CDH4 na ndị ọbịa niile (Master na Node) na ụyọkọ.

Gaa na ibe nbudata CDH gọọmentị wee jide ụdị CDH4 (ya bụ 4.6) ma ọ bụ ịnwere ike iji iwu wget na-eso budata ebe nchekwa wee wụnye ya.

# wget http://archive.cloudera.com/cdh4/one-click-install/redhat/6/i386/cloudera-cdh-4-0.i386.rpm
# yum --nogpgcheck localinstall cloudera-cdh-4-0.i386.rpm
# wget http://archive.cloudera.com/cdh4/one-click-install/redhat/6/x86_64/cloudera-cdh-4-0.x86_64.rpm
# yum --nogpgcheck localinstall cloudera-cdh-4-0.x86_64.rpm

Tupu ịwụnye Hadoop Multinode Cluster, tinye igodo GPG Cloudera Ọha na ebe nchekwa gị site na ịme otu n'ime iwu ndị a dịka nhazi usoro gị.

## on 32-bit System ##

# rpm --import http://archive.cloudera.com/cdh4/redhat/6/i386/cdh/RPM-GPG-KEY-cloudera
## on 64-bit System ##

# rpm --import http://archive.cloudera.com/cdh4/redhat/6/x86_64/cdh/RPM-GPG-KEY-cloudera

Na-esote, gbaa iwu a ka ịwụnye na ịtọlite JobTracker na NameNode na sava Master.

 yum clean all 
 yum install hadoop-0.20-mapreduce-jobtracker
 yum clean all
 yum install hadoop-hdfs-namenode

Ọzọ, gbanye iwu ndị a na ihe nkesa Master ka ịtọlite ọnụ ọnụ nke abụọ.

 yum clean all 
 yum install hadoop-hdfs-secondarynam

Na-esote, ntọlite tasktracker & datanode na ndị agha ụyọkọ niile (Node) ewezuga JobTracker, NameNode, na Secondary (ma ọ bụ Njikere) AhaNode ndị ọbịa (na ọnụ na nke a).

 yum clean all
 yum install hadoop-0.20-mapreduce-tasktracker hadoop-hdfs-datanode

Ị nwere ike ịwụnye onye ahịa Hadoop na igwe dị iche (na nke a etinyere m ya na datanode ị nwere ike tinye ya na igwe ọ bụla).

 yum install hadoop-client

Ugbu a, ọ bụrụ na anyị emechara usoro ndị dị n'elu, ka anyị gaa n'ihu na-ebuga hdfs (a ga-eme ya na oghere niile).

Detuo nhazi ndabara gaa na ndekọ ndekọ /etc/hadoop ( n'ọnụ ọnụ nke ọ bụla na ụyọkọ ).

 cp -r /etc/hadoop/conf.dist /etc/hadoop/conf.my_cluster
 cp -r /etc/hadoop/conf.dist /etc/hadoop/conf.my_cluster

Jiri iwu ndị ọzọ ka ịtọọ ndekọ aha omenala gị, dị ka ndị a ( n'ọnụ ọnụ nke ọ bụla na ụyọkọ ).

 alternatives --verbose --install /etc/hadoop/conf hadoop-conf /etc/hadoop/conf.my_cluster 50
reading /var/lib/alternatives/hadoop-conf

 alternatives --set hadoop-conf /etc/hadoop/conf.my_cluster
 alternatives --verbose --install /etc/hadoop/conf hadoop-conf /etc/hadoop/conf.my_cluster 50
reading /var/lib/alternatives/hadoop-conf

 alternatives --set hadoop-conf /etc/hadoop/conf.my_cluster

Ugbu a mepee faịlụ 'core-site.xml' wee melite fs.defaultFS na ọnụ ọ bụla na ụyọkọ.

 cat /etc/hadoop/conf/core-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
 <name>fs.defaultFS</name>
 <value>hdfs://master/</value>
</property>
</configuration>
 cat /etc/hadoop/conf/core-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
 <name>fs.defaultFS</name>
 <value>hdfs://master/</value>
</property>
</configuration>

Na-esote mmelite “dfs.permissions.superusergroup” na hdfs-site.xml na ọnụ nke ọ bụla na ụyọkọ.

 cat /etc/hadoop/conf/hdfs-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
  <property>
     <name>dfs.name.dir</name>
     <value>/var/lib/hadoop-hdfs/cache/hdfs/dfs/name</value>
  </property>
  <property>
     <name>dfs.permissions.superusergroup</name>
     <value>hadoop</value>
  </property>
</configuration>
 cat /etc/hadoop/conf/hdfs-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
  <property>
     <name>dfs.name.dir</name>
     <value>/var/lib/hadoop-hdfs/cache/hdfs/dfs/name</value>
  </property>
  <property>
     <name>dfs.permissions.superusergroup</name>
     <value>hadoop</value>
  </property>
</configuration>

Rịba ama: Biko jide n'aka na, nhazi nke dị n'elu dị na oghere niile (mee na otu ọnụ ma mee scp iji detuo na akụkụ ndị ọzọ).

Melite dfs.name.dir ma ọ bụ dfs.namenode.name.dir na 'hdfs-site.xml' na NameNode (na Master na Node). Biko gbanwee uru dị ka akọwara ya.

 cat /etc/hadoop/conf/hdfs-site.xml
<property>
 <name>dfs.namenode.name.dir</name>
 <value>file:///data/1/dfs/nn,/nfsmount/dfs/nn</value>
</property>
 cat /etc/hadoop/conf/hdfs-site.xml
<property>
 <name>dfs.datanode.data.dir</name>
 <value>file:///data/1/dfs/dn,/data/2/dfs/dn,/data/3/dfs/dn</value>
</property>

Mezue iwu dị n'okpuru ka ịmepụta usoro ndekọ aha & jikwaa ikike onye ọrụ na igwe Namenode (Master) na Datanode (Node).

 mkdir -p /data/1/dfs/nn /nfsmount/dfs/nn
 chmod 700 /data/1/dfs/nn /nfsmount/dfs/nn
  mkdir -p /data/1/dfs/dn /data/2/dfs/dn /data/3/dfs/dn /data/4/dfs/dn
  chown -R hdfs:hdfs /data/1/dfs/nn /nfsmount/dfs/nn /data/1/dfs/dn /data/2/dfs/dn /data/3/dfs/dn /data/4/dfs/dn

Hazie Namenode (na Master), site n'inye iwu.

 sudo -u hdfs hdfs namenode -format

Tinye ihe ndị a na faịlụ hdfs-site.xml wee dochie uru dị ka egosiri na Master.

<property>
  <name>dfs.namenode.http-address</name>
  <value>172.21.17.175:50070</value>
  <description>
    The address and port on which the NameNode UI will listen.
  </description>
</property>

Mara: N'ọnọdụ anyị, uru kwesịrị ịbụ adreesị IP nke nna ukwu VM.

Ugbu a, ka anyị tinye MRv1 (Map-reduce version 1). Mepee faịlụ 'mapred-site.xml' na-eso ụkpụrụ dịka egosiri.

 cp hdfs-site.xml mapred-site.xml
 vi mapred-site.xml
 cat mapred-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<configuration>
<property>
 <name>mapred.job.tracker</name>
 <value>master:8021</value>
</property>
</configuration>

Na-esote, detuo faịlụ 'mapred-site.xml' na igwe node site na iji iwu scp na-esonụ.

 scp /etc/hadoop/conf/mapred-site.xml node:/etc/hadoop/conf/
mapred-site.xml                                                                      100%  200     0.2KB/s   00:00

Ugbu a hazie akwụkwọ ndekọ aha nchekwa mpaghara ka MRv1 Daemons jiri. Ọzọ mepee faịlụ 'mapred-site.xml' wee mee mgbanwe dịka egosiri n'okpuru maka TaskTracker ọ bụla.

<property>
 <name>mapred.local.dir</name>
 <value>/data/1/mapred/local,/data/2/mapred/local,/data/3/mapred/local</value>
</property>

Mgbe ezipụtachara akwụkwọ ndekọ aha ndị a na faịlụ 'mapred-site.xml', ị ga-emerịrị akwụkwọ ndekọ aha wee kenye ha ikike faịlụ ziri ezi n'ọnụ ọnụ nke ọ bụla na ụyọkọ gị.

mkdir -p /data/1/mapred/local /data/2/mapred/local /data/3/mapred/local /data/4/mapred/local
chown -R mapred:hadoop /data/1/mapred/local /data/2/mapred/local /data/3/mapred/local /data/4/mapred/local

Ugbu a gbaa iwu a ka ịmalite HDFS n'ọnụ ọnụ ọ bụla dị na ụyọkọ.

 for x in `cd /etc/init.d ; ls hadoop-hdfs-*` ; do sudo service $x start ; done
 for x in `cd /etc/init.d ; ls hadoop-hdfs-*` ; do sudo service $x start ; done

Achọrọ ka ịmepụta /tmp jiri ikike kwesịrị ekwesị dịka ekwuru n'okpuru.

 sudo -u hdfs hadoop fs -mkdir /tmp
 sudo -u hdfs hadoop fs -chmod -R 1777 /tmp
 sudo -u hdfs hadoop fs -mkdir -p /var/lib/hadoop-hdfs/cache/mapred/mapred/staging
 sudo -u hdfs hadoop fs -chmod 1777 /var/lib/hadoop-hdfs/cache/mapred/mapred/staging
 sudo -u hdfs hadoop fs -chown -R mapred /var/lib/hadoop-hdfs/cache/mapred

Ugbu a nyochaa usoro faịlụ HDFS.

 sudo -u hdfs hadoop fs -ls -R /

drwxrwxrwt   - hdfs hadoop          	0 2014-05-29 09:58 /tmp
drwxr-xr-x   	- hdfs hadoop          	0 2014-05-29 09:59 /var
drwxr-xr-x  	- hdfs hadoop          	0 2014-05-29 09:59 /var/lib
drwxr-xr-x   	- hdfs hadoop         	0 2014-05-29 09:59 /var/lib/hadoop-hdfs
drwxr-xr-x   	- hdfs hadoop          	0 2014-05-29 09:59 /var/lib/hadoop-hdfs/cache
drwxr-xr-x   	- mapred hadoop          0 2014-05-29 09:59 /var/lib/hadoop-hdfs/cache/mapred
drwxr-xr-x   	- mapred hadoop          0 2014-05-29 09:59 /var/lib/hadoop-hdfs/cache/mapred/mapred
drwxrwxrwt   - mapred hadoop          0 2014-05-29 09:59 /var/lib/hadoop-hdfs/cache/mapred/mapred/staging

Mgbe ịmalite HDFS wee mepụta '/tmp', mana tupu ịmalite JobTracker, biko mepụta ndekọ HDFS akọwapụtara site na paramita 'mapred.system.dir' (site na ndabara & # 36 {hadoop.tmp.dir}/mapred/system). ma gbanwee onye nwe ya ka ọ kpụrụ ya.

 sudo -u hdfs hadoop fs -mkdir /tmp/mapred/system
 sudo -u hdfs hadoop fs -chown mapred:hadoop /tmp/mapred/system

Ka ịmalite MapReduce: biko malite ọrụ TT na JT.

 service hadoop-0.20-mapreduce-tasktracker start

Starting Tasktracker:                               [  OK  ]
starting tasktracker, logging to /var/log/hadoop-0.20-mapreduce/hadoop-hadoop-tasktracker-node.out
 service hadoop-0.20-mapreduce-jobtracker start

Starting Jobtracker:                                [  OK  ]

starting jobtracker, logging to /var/log/hadoop-0.20-mapreduce/hadoop-hadoop-jobtracker-master.out

Na-esote, mepụta ndekọ ụlọ maka onye ọrụ hadoop ọ bụla. a na-atụ aro ka ị mee nke a na NameNode; ọmụmaatụ.

 sudo -u hdfs hadoop fs -mkdir  /user/<user>
 sudo -u hdfs hadoop fs -chown <user> /user/<user>

Mara: ebe bụ aha njirimara Linux nke onye ọrụ ọ bụla.

N'aka nke ọzọ, ị nwere ike ịmepụta ndekọ ụlọ dịka ndị a.

 sudo -u hdfs hadoop fs -mkdir /user/$USER
 sudo -u hdfs hadoop fs -chown $USER /user/$USER

Mepee ihe nchọgharị gị wee pịnye url ka http://ip_address_of_namenode:50070 iji nweta Namenode.

Mepee taabụ ọzọ na ihe nchọgharị gị wee pịnye url ka http://ip_address_of_jobtracker:50030 iji nweta JobTracker.

A nwalela usoro a nke ọma na RHEL/CentOS 5.X/6.X. Biko kwuo n'okpuru ebe a ma ọ bụrụ na ị na-eche nsogbu ọ bụla ihu na nrụnye, m ga-enyere gị aka na ngwọta.