PahimamGuru: January 2022

HADOOP COMMANDS
OS : Ubuntu Environment
Author : Bottu Gurunadha Rao
Created: 31-Jan-2022
Updated: 31-Jan-2022
Release : 1.0.1
Purpose: To learn Hadoop distributed file system commands.
History: N/A

Note :To execute hadoop commands every command need to be preceded by hadoop fs or hdfs dfs combination.
Actually hadoop is deprecated and so we recommended to use hdfs dfs combination.

1. To check your location.
id -u
if you get zero, it means you are in root, since a UID of 0 (zero) means "root".
If you get any other number, it indicates that you are not in root.

2. To check current ubuntu version
lsb_release -a

3.Creating / Adding a new user
Syntax: sudo adduser <username>
        sudo adduser guru
Note : It asks for password and confirm password provide it. Keep on pressing enter keys for rest of questions. But for last one type yes and press enter key.

4.Deleting an existing user.
Syntax : sudo deluser <username>
         sudo deluser guru
Note : Now no longer it is useful, but its name is visible in home. To remove it permanently use rm command.

5.Deleting a user using rm.
Syntax: sudo rm -r <username>
        sudo rm -r guru

6.List all the available user in home.
ls /home

7.Display contents in local filesystem.
ls

8.Display contents in distributed(hadoop/hdfs) filesystem.
hadoop fs -ls /
or
hdfs dfs -ls /

9.To check health of hadoop file system.
hadoop fsck /
or
hdfs fsck /

checking the health of a file in the directory
hadoop fsck /d1 -files ( here d1 is the directory)

10.To know hadoop version
hadoop version or hdfs version

11.To check hadoop root
hadoop fs -ls /
or
hdfs dfs -ls /

12.creating a directory in hadoop filesystem
hdfs dfs -mkdir -p /tst
Note : In above command tst is directory and -p is option to create a directory

13.creating an empty file in hadoop filesystem root
hdfs dfs -touchz /n1.txt

14.creating an empty file in hadoop filesystem sub directory
hdfs dfs -touchz /d2/t2.txt
Note: In the above command d2 is directory and t2.txt is an empty file.

15.copying a file from local file system to hadoop file system.
Method1: Using appendToFile command
hdfs dfs -appendToFile - /d1/gurus.txt

Note1:Once above command is issued cursor waits for user to enter some text..after entering some text save it by pressing CTRL+D.
Note2:To check weather it is created or not use the following command.
hdfs dfs -cat /d1/gurus.txt

Hadoop(3.3.0) Installation in Ubuntu Environment
Author : Bottu Gurunadha Rao
Created: 26-Jan-2022
Updated: 30-Jan-2022
Release : 1.0.1
Purpose: Hadoop(3.3.0) distributed Installation steps in Ubuntu
History: N/A

**Its cakewalk to install Hadoop, if you follow these steps.

1.Update ubuntu OS
command : sudo apt update (or) sudo apt-get update &
sudo apt upgrade (or) sudo apt-get upgrade

2.Install latest java Software
sudo apt-get install default-jdk

3. After installation, Check java version
java -version
javac -version

4. If you are not in root switch to root. Do the following to go to root.
Syntax: sudo su
Then it asks for root password provide it.

5.Do it from root. Create a new User for hadoop, you may use any username. Here we used hduser.
Note: it is not mandatory to use sudo before any command, it is an optional if you are in root.
sudo adduser hduser

6. Now change from current user to new user, it is "hduser"
Syntax to change user: su <username>, then it asks for new user password provide it.
su hduser
then it asks for password, provide it.

7. Register new user(here it is hduser) details in sudoers file. sudoers file is in the /etc path.
sudoers file controls who can run what commands as what users on what machines and can also control special things such as whether you need a password for particular commands.

cd /etc
sudo nano sudoers

**In the sudoers file add hduser as follows
# User privilege specification
   root    ALL=(ALL:ALL) ALL
   hduser ALL=(ALL:ALL) ALL

8.Install the OpenSSH server and client
sudo apt install openssh-server openssh-client -y

9.Generate an ssh keypair using keygen
ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa

10. Append generated keys and also store the public key as authorized_keys in the ssh directory, using cat command.
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

11.Set the permissions for your user(here hduser)
chmod 0600 ~/.ssh/authorized_keys

12.Verify everything is set up correctly by using the hduser user to SSH to localhost
hduser@ubuntu:$ ssh localhost

13.Download and Install Hadoop
Method1:
wget https://downloads.apache.org/hadoop/common/hadoop-3.3.0/hadoop-3.3.0.tar.gz
In the above method tar file will be downloaded in the current
location.
Method2:
https://hadoop.apache.org/releases.html
In this method hadoop3.3.0 binary will be downloaded in to
download folder. In this method need to un-tar it and copy the hadoop folder to required location.

14.Extract files
tar xzvf hadoop-3.3.0.tar.gz

15.Single Node Hadoop Deployment,
The following files need to be configured
.bashrc, hadoop-env.sh, core-site.xml, hdfs-site.xml,
mapred-site-xml, yarn-site.xml

16.Configure Hadoop Environment Variables (in .bashrc file)
hduser@ubuntu:~/hadoop-3.3.0$ sudo nano ~/.bashrc

17.Define the Hadoop environment variables-add the following var @ end of the file
#Hadoop environment variables
export HADOOP_HOME=/home/hduser/hadoop-3.3.0
export JAVA_HOME=/usr/lib/jvm/java-11-openjdk-amd64
export HADOOP_INSTALL=$HADOOP_HOME
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib/native"

18.To apply above changes to current session do the following
hduser@ubuntu:~/hadoop-3.3.0$ exec bash
hduser@ubuntu:~/hadoop-3.3.0$ source ~/.bashrc

19.Edit hadoop-env.sh file
hduser@ubuntu:~/hadoop-3.3.0$ sudo nano $HADOOP_HOME/etc/hadoop/hadoop-env.sh

Note: To know what java path to be added do the following.
To know java installation path :
command: which java
output: /usr/bin/java

To know java installation link:
command: readlink -f /usr/bin/java
output: /usr/lib/jvm/java-11-openjdk-amd64/bin/java

In the above output JAVA_HOME is "/usr/lib/jvm/java-11-openjdk-amd64",
use it to set environment variable.

20.UN-comment the $JAVA_HOME variable in the "hadoop-env.sh" file and add the following
export JAVA_HOME=/usr/lib/jvm/java-11-openjdk-amd64

21.Edit core-site.xml File -core-site.xml file defines HDFS and Hadoop core properties
hduser@ubuntu:~/hadoop-3.3.0$ sudo nano /home/hduser/hadoop-3.3.0/etc/hadoop/core-site.xml

<configuration>
   <property>
     <name>hadoop.tmp.dir</name>
       <value>/home/hduser/tmpdata</value>
     </property>
   <property>
      <name>fs.default.name</name>
      <value>hdfs://localhost:9000</value>
   </property>
</configuration>

22.Edit hdfs-site.xml File - Configure the file by defining the NameNode and DataNode storage directories.
hduser@ubuntu:~/hadoop-3.3.0$ sudo nano /home/hduser/hadoop-3.3.0/etc/hadoop/hdfs-site.xml

<configuration>
<property>
    <name>dfs.data.dir</name>
    <value>/home/hduser/dfsdata/namenode</value>
</property>
<property>
    <name>dfs.data.dir</name>
    <value>/home/hduser/dfsdata/datanode</value>
</property>
<property>
    <name>dfs.replication</name>
    <value>1</value>
</property>
</configuration>

23.Edit mapred-site.xml File - define MapReduce values:
hduser@ubuntu:~$ sudo nano /home/hduser/hadoop-3.3.0/etc/hadoop/mapred-site.xml
<configuration>
<property>
    <name>mapreduce.framework.name</name>
    <value>yarn</value>
</property>
</configuration>

24.Edit yarn-site.xml File - configurations for the Node Manager, Resource Manager,
   Containers, and Application Master.
   hduser@ubuntu:~$ sudo nano /home/hduser/hadoop-3.3.0/etc/hadoop/yarn-site.xml

<configuration>
<property>
    <name>yarn.nodemanager.aux-services</name>
    <value>mapreduce_shuffle</value>
</property>
<property>
    <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
    <value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
    <name>yarn.resourcemanager.hostname</name>
    <value>127.0.0.1</value>
</property>
<property>
    <name>yarn.acl.enable</name>
    <value>0</value>
</property>
<property>
    <name>yarn.nodemanager.env-whitelist</name>
<value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PERPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAPRED_HOME</value>
</property>
</configuration>

25.Run hadoop services from this location "/home/hduser/hadoop-3.3.0/sbin"
command : hadoop namenode -format

26.Start dfs services, from sbin of HadoopHome (/home/hduser/hadoop-3.3.0/sbin)
Notes: It starts NameNode, DataNode, SecondaryNameNode services
command : ./start-dfs.sh

27.Start Yarn Services, from sbin of HadoopHome (/home/hduser/hadoop-3.3.0/sbin)
Note : It starts ResourceManager and NodeManager
command : ./start-yarn.sh.

28.Instead of using step25 & 26 we can run all services in a single command.
command : ./start-all.sh    [It is not recommended by experts]

29.To check whether all the services are up and running successfully type jps in terminal.
command: jps

output:
8161 Jps
6482 DataNode
6754 SecondaryNameNode
6308 NameNode
7029 ResourceManager
7211 NodeManager

Note:
If all the above listed six services are visible then you are successful in Hadoop Installation.
If any name is missed in the output, it indicates that your installation is not successful-go through all steps again.

30.To safely stop all Hadoop Services use following command.(from this location: /home/hduser/hadoop-3.3.0/sbin)
command : ./stop-all.sh

PahimamGuru

Monday, January 31, 2022

Hadoop Commands

Wednesday, January 26, 2022

HADOOP (3.3.0) INSTALLATION

Hadoop Commands

Search This Blog

Followers