Configure Hadoop and start Hadoop cluster services using Ansible Playbook.

Yashraj Panda
3 min readDec 1, 2020

In this article, we will configure Hadoop and start Hadoop cluster services using Ansible Playbook.

Prerequisite:

  1. Install Python
  2. Install Ansible

What is ansible?

Ansible works by connecting to your nodes and pushing out small programs, called “Ansible modules” to them. Ansible then executes these modules (over SSH by default), and removes them when finished. Your library of modules can reside on any machine, and there are no servers, daemons, or databases required.

What is Ansible Playbook?

An Ansible playbook is an organized unit of scripts that defines work for a server configuration managed by the automation tool Ansible. Ansible is a configuration management tool that automates the configuration of multiple servers by the use of Ansible playbooks.

Everything about Hadoop:

You can check out the following links to know more about Hadoop and its configuration:

https://www.linkedin.com/posts/yashrajpanda_big-data-storage-technology-used-by-it-giants-activity-6723496225448034304-waKP

Ansible configuration file:

With a fresh installation of Ansible, like every other software, it ships with a default configuration file. This is the brain and the heart of Ansible, the file that governs the behavior of all interactions performed by the control node. Here we will set up our ansible inventory.

Ansible hosts file:

The Ansible inventory file defines the hosts and groups of hosts upon which commands, modules, and tasks in a playbook operate. The file can be in one of many formats depending on your Ansible environment and plugins. The inventory file can list individual hosts or user-defined groups of hosts.

Now that we have configured our ansible inventory and hosts file, we can ping the hosts and check whether they are connecting or not in the following way:

Namenode setup:

Run the following command to setup the Hadoop namenode using ansible-playbook

Ansible-playbook namenode.yml
(P.S: the error in task[installing jdk software] is because I have already installed jdk software in my system. And this error is being ignored so it wont effect other process)

Run the “jps” command to check whether the Hadoop namenode has started or not,

Datanode setup:

Run the following command to setup the Hadoop datanode using ansible-playbook

Ansible-playbook datanode.yml
(P.S: the error in task[installing jdk software] is because I have already installed jdk software in my system. And this error is being ignored so it wont effect other process)

Run the “jps” command to check whether the Hadoop datanode has started or not,

Access the below mentioned GitHub link to checkout the “namenode.yml” and “datanode.yml” files.

https://github.com/yashraj24/Hadoop-automation-using-ansible

--

--

Yashraj Panda

A B.tech undergrad, enthusiastic towards learning new technologies in the market and integrate the technologies with each other.