Wednesday, January 26, 2022

HADOOP (3.3.0) INSTALLATION

Hadoop(3.3.0) Installation in Ubuntu Environment
Author : Bottu Gurunadha Rao
Created: 26-Jan-2022
Updated: 30-Jan-2022
Release  : 1.0.1
Purpose: Hadoop(3.3.0) distributed Installation steps in Ubuntu
History: N/A

 **Its cakewalk to install Hadoop, if you follow these steps.

 1.Update ubuntu OS
   command : sudo apt update (or) sudo apt-get update  &
                      sudo apt upgrade (or) sudo apt-get upgrade

2.Install latest java Software
  sudo apt-get install default-jdk

3. After installation, Check java version
  java -version   
  javac -version  

4. If you are not in root switch to root. Do the following to go to root.
Syntax: sudo su
Then it asks for root password provide it.

5.Do it from root. Create a new User for hadoop, you may use any username. Here we used hduser.
Note: it is not mandatory to use sudo before any command, it is an optional if you are in root.
sudo adduser hduser

6. Now change from current user to new user, it is "hduser"
Syntax to change user: su <username>, then it asks for new user password provide it.
su hduser
then it asks for password, provide it.


7. Register new user(here it is hduser) details in sudoers file.  sudoers file is in the /etc path.
sudoers file controls who can run what commands as what users on what machines and can also control special things such as whether you need a password for particular commands.

cd /etc
sudo nano sudoers

**In the sudoers file add hduser as follows
# User privilege specification
     root    ALL=(ALL:ALL) ALL
     hduser ALL=(ALL:ALL) ALL

8.Install the OpenSSH server and client
  sudo apt install openssh-server openssh-client -y

9.Generate an ssh keypair using keygen
  ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa

10. Append generated keys and also store the public key as authorized_keys in the ssh directory, using cat command.
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

11.Set the permissions for your user(here hduser)
  chmod 0600 ~/.ssh/authorized_keys

12.Verify everything is set up correctly by using the hduser user to SSH to localhost
hduser@ubuntu:$ ssh localhost

13.Download and Install Hadoop
 Method1:
 wget https://downloads.apache.org/hadoop/common/hadoop-3.3.0/hadoop-3.3.0.tar.gz
In the above method tar file will be downloaded in the current
location.
Method2:
https://hadoop.apache.org/releases.html
In this method hadoop3.3.0 binary will be downloaded in to
download folder. In this method need to un-tar it and copy the hadoop folder to required location.

14.Extract files
tar xzvf hadoop-3.3.0.tar.gz

15.Single Node Hadoop Deployment,
The following files need to be configured
.bashrc, hadoop-env.sh, core-site.xml, hdfs-site.xml,
 mapred-site-xml, yarn-site.xml

16.Configure Hadoop Environment Variables (in .bashrc file)
hduser@ubuntu:~/hadoop-3.3.0$ sudo nano ~/.bashrc

17.Define the Hadoop environment variables-add the following var @ end of the file
#Hadoop environment variables
export HADOOP_HOME=/home/hduser/hadoop-3.3.0
export JAVA_HOME=/usr/lib/jvm/java-11-openjdk-amd64
export HADOOP_INSTALL=$HADOOP_HOME
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib/native"

18.To apply above changes to current session do the following
hduser@ubuntu:~/hadoop-3.3.0$ exec bash
hduser@ubuntu:~/hadoop-3.3.0$ source ~/.bashrc

19.Edit hadoop-env.sh file
hduser@ubuntu:~/hadoop-3.3.0$ sudo nano $HADOOP_HOME/etc/hadoop/hadoop-env.sh
   
Note: To know what java path to be added do the following.
To know java installation path :
command: which java
output: /usr/bin/java

To know java installation link:
command: readlink -f /usr/bin/java
output: /usr/lib/jvm/java-11-openjdk-amd64/bin/java
  
In the above output JAVA_HOME is "/usr/lib/jvm/java-11-openjdk-amd64",
use it to set environment variable.
   
20.UN-comment the $JAVA_HOME variable in the "hadoop-env.sh" file and add the following
export JAVA_HOME=/usr/lib/jvm/java-11-openjdk-amd64

21.Edit core-site.xml File -core-site.xml file defines HDFS and Hadoop core properties
hduser@ubuntu:~/hadoop-3.3.0$ sudo nano /home/hduser/hadoop-3.3.0/etc/hadoop/core-site.xml

<configuration>
   <property>
     <name>hadoop.tmp.dir</name>
       <value>/home/hduser/tmpdata</value>
     </property>
   <property>
      <name>fs.default.name</name>
      <value>hdfs://localhost:9000</value>
   </property>
</configuration>
   
22.Edit hdfs-site.xml File - Configure the file by defining the NameNode and DataNode storage directories.
hduser@ubuntu:~/hadoop-3.3.0$ sudo nano /home/hduser/hadoop-3.3.0/etc/hadoop/hdfs-site.xml
   
<configuration>
  <property>
    <name>dfs.data.dir</name>
    <value>/home/hduser/dfsdata/namenode</value>
  </property>
  <property>
    <name>dfs.data.dir</name>
    <value>/home/hduser/dfsdata/datanode</value>
  </property>
  <property>
    <name>dfs.replication</name>
    <value>1</value>
  </property>
</configuration>

23.Edit mapred-site.xml File - define MapReduce values:
hduser@ubuntu:~$ sudo nano /home/hduser/hadoop-3.3.0/etc/hadoop/mapred-site.xml
<configuration>
  <property>
    <name>mapreduce.framework.name</name>
    <value>yarn</value>
  </property>
</configuration>

24.Edit yarn-site.xml File - configurations for the Node Manager, Resource Manager,
   Containers, and Application Master.
   hduser@ubuntu:~$ sudo nano /home/hduser/hadoop-3.3.0/etc/hadoop/yarn-site.xml
   
<configuration>
  <property>
    <name>yarn.nodemanager.aux-services</name>
    <value>mapreduce_shuffle</value>
  </property>
  <property>
    <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
    <value>org.apache.hadoop.mapred.ShuffleHandler</value>
  </property>
  <property>
    <name>yarn.resourcemanager.hostname</name>
    <value>127.0.0.1</value>
  </property>
  <property>
    <name>yarn.acl.enable</name>
    <value>0</value>
  </property>
  <property>
    <name>yarn.nodemanager.env-whitelist</name>
 <value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PERPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAPRED_HOME</value>
  </property>
</configuration>

25.Run hadoop services from this location "/home/hduser/hadoop-3.3.0/sbin"
 command : hadoop namenode -format

26.Start dfs services, from sbin of HadoopHome (/home/hduser/hadoop-3.3.0/sbin)
 Notes: It starts NameNode, DataNode, SecondaryNameNode services
command : ./start-dfs.sh

27.Start Yarn Services, from sbin of HadoopHome (/home/hduser/hadoop-3.3.0/sbin)
Note : It starts ResourceManager and NodeManager
command : ./start-yarn.sh.

28.Instead of using step25 & 26 we can run all services in a single command.
command : ./start-all.sh    [It is not recommended by experts]

29.To check whether all the services are up and running successfully type jps in terminal.
command: jps

output:
8161 Jps
6482 DataNode
6754 SecondaryNameNode
6308 NameNode
7029 ResourceManager
7211 NodeManager

Note:
If all the above listed six services are visible then you are successful in Hadoop Installation.
If any name is  missed in the output, it indicates that your installation is not successful-go through all steps again.
 
30.To safely stop all Hadoop Services use following command.(from this location: /home/hduser/hadoop-3.3.0/sbin)
command : ./stop-all.sh

No comments:

Post a Comment

Hadoop Commands

HADOOP COMMANDS OS : Ubuntu Environment Author : Bottu Gurunadha Rao Created: 31-Jan-2022 Updated: 31-Jan-2022 Release  : 1.0.1 Purpose: To ...

Search This Blog