Install hadoop on ubuntu 10.04 LTS

Install hadoop on ubuntu 10.04 LTS server installion.
Install ubuntu l0.04 LTS,
Update,upgrade and install ssh
#apt-get update
#apt-get upgrade
#apt-get install ssh

Install sun-6-java jdk
# See https://launchpad.net/~ferramroberto/

$ sudo apt-get install python-software-properties
$ sudo add-apt-repository ppa:ferramroberto/java

# Update the source list
$ sudo apt-get update

# Install Sun Java 6 JDK
$ sudo apt-get install sun-java6-jdk

$ sudo update-java-alternatives -s java-6-sun

Create new user hadoop and group
$ sudo addgroup hadoop 
$ sudo adduser --ingroup hadoop hduser

Generate ssh key to auto login to manager
#su - hduser
#ssh-keygen -t rsa -P ""
#cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys 

Disable ipv6 and reboot 
#vim /etc/modprobe.d/blacklist
add new line in it
blacklist ipv6  
Or
#disable ipv6 
net.ipv6.conf.all.disable_ipv6 = 1
net.ipv6.conf.default.disable_ipv6 = 1
net.ipv6.conf.lo.disable_ipv6 = 1
You have to reboot your machine in order to make the changes take effect.
You can check whether IPv6 is enabled on your machine
$ cat /proc/sys/net/ipv6/conf/all/disable_ipv6
 0 means IPv6 is enabled, a value of 1 means 
disabled (that’s what we want).

Hadoop Distributed File System (HDFS)
Image copy by javacodegeeks.com
















Download hadoop from mirror site
$ cd /usr/local 
$ sudo tar xzf hadoop-1.0.3.tar.gz 
$ sudo mv hadoop-1.0.3 hadoop 
$ sudo chown -R hduser:hadoop hadoop

Confirm java home folder
#ls -l `whereis javac`

Modify hadoop home folder hadoop-env.sh 
#vim hadoop/conf/hadoop-env.sh
uncomment export JAVA_HOME and modify it
export JAVA_HOME = /usr/lib/jvm/java-6-openjdk/

Config hadoop config file 
#vim hadoop/conf/core-site.xml
add these line into it
<property>
    <name>hadoop.tmp.dir</name>
    <value>/home/hadoop/hadoop/tmp/dir/hadoop-hadoop</value>
</property>
<property>
    <name>fs.default.name</name>
    <value>hdfs://localhost</value>
</property>

Config hadoop file with hdfs-site.xml
#vim hadoop/conf/hdfs-site.xml
add these line into it
<property>
    <name>dfs.replication</name>
    <value>1</value>
</property>

Config hadoop file with mapred-site.xml
#vim hadoop/conf/mapred-site.xml
add these line into it
<property>
    <name>mapred.job.tracker</name>
    <value>localhost:54311</value>
</property>

Formating the namenode
#hadoop/bin/hadoop namenode -format

Start cluster
#hadoop/bin/start-all.sh

Check hadoop process
#hadoop/bin/jps

Use netstat to check all service running status
#netstat -plten | grep java

Stop cluster
#hadoop/bin/stop-all.sh

Start cluster
#hadoop/bin/start-all.sh

Mkdir a folder for gutenberg and touch three files with contents
#mkdir /tmp/gutenberg
#cd /tmp/gutenberg
#vim 1.txt
#vim 2.txt
#vim 3.txt

Use hadoop fs copyFromLocal copy files to hdfs folder
#hadoop/bin/hadoop fs -copyFromLocal /tmp/gutenberg gutenber

Check hdfs folder content
#hadoop/bin/hadoop fs -ls 
#hadoop/bin/hadoop fs -ls gutenberg

Use java wordcount to calculate the words number
#hadoop/bin/hadoop jar hadoop-mapred-examples-0.21.0.jar 
wordcount gutenberg gutenberg-output


Hadoop Web Interfaces

    http://localhost:50070/ – web UI of the NameNode daemon
    http://localhost:50030/ – web UI of the JobTracker daemon
    http://localhost:50060/ – web UI of the TaskTracker daemon



Vishal Vyas

0 comments:

Post a Comment