HADOOP Administration

BIG DATA and ANALYTICS

HADOOP Administration

This training course is a comprehensive study of Big Data Administration using Hadoop. The course topics include Introduction to Hadoop and its Architecture, MapReduce and HDFS and MapReduce Abstraction. It further covers best practices to configure, deploy, administer, maintain, monitor and troubleshoot a Hadoop Cluster

HADOOP ADMINISTRATOR

Course Duration: 50 Hrs.(3-4MONTHS)                               Timings:Weekends/Custom/Flexible

Mode of Training: Regular/Fast Track                                 

 Course Topics: 

 Objective 1.1 – HDFS

 Understand HDFS architecture

 Understand how the NameNode maintains the file-system metadata

 Understand how data is stored in HDFS

 Understand the relationship between NameNodes and DataNodes

 Understand the relationship between NameNodes and namespaces in Hadoop 2.0

 Understand the WebHDFS commands

 Understand the various “hadoopfs” commands

 Objective 2.1 – Install and Configure HDP

 Understand the minimum hardware and software requirements

 Understand how to set up a local repository for HDP installation

 Understand how to install HDP using Apache Ambari

 Understand differences between master and slave services

 Understand complete deployment layout

 Understand how to configure and manage different services

 Understand different configuration parameters

 Objective 3.1 – Ensure Data Integrity

 Understand block-scanning reportDefine Pig relations

 Run file-system check

 Understand replication factor, under & over replication

 Set up NFS Gateway to access HDFS data

 Objective 4.1 – YARN Architecture and MapReduce

 Understand the architecture of YARN

 Understand the components of the YARN ResourceManager

 Demonstrate the relationship between NodeManagers and ApplicationMasters

 Demonstrate the relationship between ResourceManagers and ApplicationMasters

 Explain the relationship between Containers and ApplicationMasters

 Explain how Container failure is handled for a YARN MapReduce job

 Understand the architecture of MapReduce

 Understand the various phases of a MapReduce job

 Objective 5.1 – Job Schedulers and Enterprise Data Movement

 Understand the concept of job scheduling

 Configure the capacity scheduler

 Understand the difference between capacity and fair scheduler

 Understand various data ingestion mechanisms for Hadooop

 Explain the different between traditional and Hadoop-based ETL platforms

 Use the distcp command to move data between two clusters

 Understand Hive architecture

 Move data between a traditional database and Hadoop using Apache Sqoop

 Explain Hive/MR vs. Hive/Tez

 Stream data using Apache Flume

 Configure workflows and deployment using Apache Oozie

 Objective 6.1 – Monitor and Administer Clusters

 Monitor using the Ambari UI, Ganglia, and Nagios

 Commission and decomission nodes

 Back up and recover Hadoop data

 Use Hadoop snapshots

 Understand rack awareness and topology

 Understand NameNode high availability

 Use the “hdfshaadmin” commands

 Objective 7.1 – Secure HDP

 Understand security concepts

 Configure Kerberos

 Configure HDP authorization and authentication

  • Course Package:0
  • Course Duration:50 hours
  • Buy
  • Apply Now
  • 18

Back to Top
Content