Configure Hadoop and start Hadoop cluster services using Ansible Playbook.

3 min readDec 1, 2020

In this article, we will configure Hadoop and start Hadoop cluster services using Ansible Playbook.

Prerequisite:

Install Python
Install Ansible

What is ansible?

Ansible works by connecting to your nodes and pushing out small programs, called “Ansible modules” to them. Ansible then executes these modules (over SSH by default), and removes them when finished. Your library of modules can reside on any machine, and there are no servers, daemons, or databases required.

What is Ansible Playbook?

An Ansible playbook is an organized unit of scripts that defines work for a server configuration managed by the automation tool Ansible. Ansible is a configuration management tool that automates the configuration of multiple servers by the use of Ansible playbooks.

Everything about Hadoop:

You can check out the following links to know more about Hadoop and its configuration:

https://www.linkedin.com/posts/yashrajpanda_big-data-storage-technology-used-by-it-giants-activity-6723496225448034304-waKP

Ansible configuration file:

With a fresh installation of Ansible, like every other software, it ships with a default configuration file. This is the brain and the heart of Ansible, the file that governs the behavior of all interactions performed by the control node. Here we will set up our ansible inventory.

Ansible hosts file:

The Ansible inventory file defines the hosts and groups of hosts upon which commands, modules, and tasks in a playbook operate. The file can be in one of many formats depending on your Ansible environment and plugins. The inventory file can list individual hosts or user-defined groups of hosts.

Now that we have configured our ansible inventory and hosts file, we can ping the hosts and check whether they are connecting or not in the following way: