Configuring Hadoop Using Ansible

Tejashwinikottha
3 min readMar 21, 2021

--

Hello Guys..!!!

Here am going to configure Hadoop and start cluster using Ansible Playbook.

Requirements: Target nodes (Ec2 instances), Managed node(RHEL8),Hadoop software file ,JDK file.

Steps:

  1. Configure Ansible in managed node
  2. Write two ansible playbooks which automates a target node to start namenode and the other for datanode.

3.These playbooks include tasks of

— copying of Hadoop software and JDK files

— Installing Hadoop and Java

— creating namenode and datanode directory

— copying of core-site and Hdfs-site files

— Starting Namenode ,Datanode

— check JPS

4.Run Both Ansible Playbooks.

Implementation:

First Install Ansible by using cmd “ yum install Ansible

Now setup Inventory file and add IP’s of Target nodes , Also configure the file at “ansible.cfg”.

Ansible is configured.

Now , We can make a playbook which can automates to start namenode and check Jps.

- hosts: namenode
tasks:
— name: “copy__hadoop software file”
copy:
src : “/root/hadoop-1.2.1–1.x86_64.rpm”
dest: “/root”

- name: “copy__jdk file”
copy:
src: “/root/jdk-8u171-linux-x64.rpm”
dest: “/root”

- name: “Installing hadoop”
shell: “rpm -ivh hadoop-1.2.1–1.x86_64.rpm — force”
register: Hadoop
ignore_errors: yes

- name: “Installing java”
shell: rpm -ivh jdk-8u171-linux-x64.rpm
register: java
ignore_errors: yes

- name: “Creating_Directory”
file:
state: directory
path: “nn1”

- name: “Copy_coresite file”
copy:
src: “core-site.xml”
dest: “/etc/hadoop/core-site.xml”

— name: “Copy_hdfs-site.xml”
copy:
src: “hdfs-site.xml”
dest: “/etc/hadoop/hdfs-site.xml”

- name: “Format_namenode”
shell: “echo Y | hadoop namenode -format”
register: format

- name: “Start_Namenode”
shell: “hadoop-daemon.sh start namenode”
ignore_errors: yes
register: namenode_starts

- name: “check_JPS”
shell: “jps”
register: jps

Now run this Playbook using cmd

“ansible-playbook namenode.yml”

The output of the playbook is

As the playbook run successfully , The Namenode started .

Let us check whether the Namenode started.

As shown in above image ,Hadoop installed , jdk installed also namenode started .

Now also make the datanode playbook as follows

- hosts: datanode
tasks:
— name: “copy__hadoop software file”
copy:
src : “/root/hadoop-1.2.1–1.x86_64.rpm”
dest: “/root”

- name: “copy__jdk file”
copy:
src: “/root/jdk-8u171-linux-x64.rpm”
dest: “/root”

- name: “Installing hadoop”
shell: “rpm -ivh hadoop-1.2.1–1.x86_64.rpm — force”
register: Hadoop
ignore_errors: yes

- name: “Installing java”
shell: “rpm -ivh jdk-8u171-linux-x64.rpm”
register: java
ignore_errors: yes

- name: “Creating_Directory”
file:
state: directory
path: “dn1”

- name: “Copy_coresite file”
copy:
src: “core-site.xml”
dest: “/etc/hadoop/core-site.xml”

— name: “Copy_hdfs-site.xml”
copy:
src: “hdfs-site.xml”
dest: “/etc/hadoop/hdfs-site.xml”

- name: “Start_Datanode”
shell: “hadoop-daemon.sh start datanode”
ignore_errors: yes
register: datanode_starts

- name: “check_JPS”
shell: “jps”
register: jps

Run this playbook using cmd

“ansible-playbook datanode.yml”

The ouput of the playbook is

Datanode playbook also run successfully and datanode is started.

In this way Hadoop Cluster configuration is done by Automation using Ansible playbook .

Thank you…!!!

--

--

No responses yet