Configure Hadoop and start Hadoop cluster services using Ansible Playbook.

In this article, we will configure Hadoop and start Hadoop cluster services using Ansible Playbook.

  1. Install Python
  2. Install Ansible

Ansible works by connecting to your nodes and pushing out small programs, called “Ansible modules” to them. Ansible then executes these modules (over SSH by default), and removes them when finished. Your library of modules can reside on any machine, and there are no servers, daemons, or databases required.

An Ansible playbook is an organized unit of scripts that defines work for a server configuration managed by the automation tool Ansible. Ansible is a configuration management tool that automates the configuration of multiple servers by the use of Ansible playbooks.

You can check out the following links to know more about Hadoop and its configuration:

https://www.linkedin.com/posts/yashrajpanda_big-data-storage-technology-used-by-it-giants-activity-6723496225448034304-waKP

With a fresh installation of Ansible, like every other software, it ships with a default configuration file. This is the brain and the heart of Ansible, the file that governs the behavior of all interactions performed by the control node. Here we will set up our ansible inventory.

The Ansible inventory file defines the hosts and groups of hosts upon which commands, modules, and tasks in a playbook operate. The file can be in one of many formats depending on your Ansible environment and plugins. The inventory file can list individual hosts or user-defined groups of hosts.

Now that we have configured our ansible inventory and hosts file, we can ping the hosts and check whether they are connecting or not in the following way:

Run the following command to setup the Hadoop namenode using ansible-playbook

Ansible-playbook namenode.yml
(P.S: the error in task[installing jdk software] is because I have already installed jdk software in my system. And this error is being ignored so it wont effect other process)

Run the “jps” command to check whether the Hadoop namenode has started or not,

Run the following command to setup the Hadoop datanode using ansible-playbook

Ansible-playbook datanode.yml
(P.S: the error in task[installing jdk software] is because I have already installed jdk software in my system. And this error is being ignored so it wont effect other process)

Run the “jps” command to check whether the Hadoop datanode has started or not,

Access the below mentioned GitHub link to checkout the “namenode.yml” and “datanode.yml” files.

https://github.com/yashraj24/Hadoop-automation-using-ansible

--

--

A B.tech undergrad, enthusiastic towards learning new technologies in the market and integrate the technologies with each other.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Yashraj Panda

A B.tech undergrad, enthusiastic towards learning new technologies in the market and integrate the technologies with each other.