Big Data and Hadoop Developer Training

Instructor-led classrrom

Blended learning with instructor-led-online classrrom sessions and online self learning.

Course completion certificate

Course completion certificate to all the participants

Project on Big Data

Project on Big Data and Hadoop development
Enroll Now

For Individuals

Sorry No batches available.

For Business

  • Blended learning (live on-site or online training) that fits your tight schedules
  • Cost-efficient, tailored solutions from industry experts
  • Customized assessments to track training outcomes

Our Big Data and Hadoop training program is designed to ensure that you gain expertise in HDFS, Yarn, MapReduce, HBase, Oozie, Flume and Sqoop. The lectures, real-time use cases and hands-on exercises will make it easy for you to manage Hadoop 2.7 environment and perform data analytics using Pig and Hive. By the end of the course, you’ll gain confidence to build powerful data processing applications using Hadoop.

  •  Blended learning with instructor-led-online classrrom sessions and online self learning.
  •  Hand-on Lab Exercises
  •  Industry Specific Projects
  •  Chapter Quizzes
  •  Big Data & Hadoop Simulation Exams
  •  Downloadable e-Book Included
  •  Java Essentials for Hadoop Included
  •  Hadoop Installation Procedure Included
  •  Hand's on Hadoop training Certification
  •  Hadoop Deployment and Maintenance Tips
  •  Packed with Latest & Advanced modules like YARN, Flume, Oozie, Mahout & Chukwa

By the end of this program participants will have learnt to:

  • Master the concepts of Hadoop Distributed File System and MapReduce framework
  • Setup a Hadoop Cluster
  • Understand Data Loading Techniques using Sqoop and Flume
  • Program in MapReduce (Both MRv1 and MRv2)
  • Learn to write Complex MapReduce programs
  • Program in YARN (MRv2)
  • Perform Data Analytics using Pig and Hive
  • Implement HBase, MapReduce Integration, Advanced Usage and Advanced Indexing
  • Have a good understanding of ZooKeeper service
  • New features in Hadoop 2.0 -- YARN, HDFS Federation, NameNode High Availability
  • Implement best Practices for Hadoop Development and Debugging

This course is designed for:

  • Software Professionals,
  • Analytics Professionals,
  • ETL developers,
  • Project Managers,
  • Testing Professionals
  • Professionals who want to acquire a solid foundation of Hadoop Architecture
Introduction to Big Data & Hadoop
 What is Big Data
 Learn about the history and rise of Big Data
 Why did Big Data suddenly become so prominent
 Limitations of traditional large scale systems
 Who are the main vendors in the space - Cloudera - Hortonworks
 Introduction to Hadoop
 History of Hadoop
 Companies using Hadoop
Hadoop Architecture / Introduction to HDFS
 Understanding Hadoop Master-Slave Architecture
 Understanding HDFS and MapReduce framework
 Regular file system vs HDFS
 Learn about NameNode, DataNode, Secondary Node
 Learn about JobTracker, TaskTracker
 Understand how data is written and read from HDFS
Installing and setting up a Hadoop Cluster
 Understand the important configuration files in a Hadoop Cluster
 Deploy the Cloudera Hadoop distribution in a VM player
 Run HDFS and Linux commands
 Execute some examples to get a high level understanding
 Hadoop deployment - Single node, Multinode
 Learn how to setup and deploy a multinode Hadoop Cluster on AWS
Understanding Hadoop MapReduce Framework
 Overview of the MapReduce Framework
 Understand the concept of Mappers, Reducers, Partitioners, Combiners
 Understand different Input Formats
 Understand different Output Formats
 Custom Data Types
 Writing MapReduce Mappers, and Reducers in Java using Eclipse
 Using writable interface
 JUnit and MRUnit Testing Frameworks
 Writing and running unit test
PIG
 Introduction to PIG
 Setting up and running PIG
 Grunt
 Pig Latin
 Writing PIG Latin scripts
Cloudera Impala
 Introduction to Impala
 Installing and using impala
 Create table using Impala
 Query the Impala table
 Impala SQL language reference
 Impala shell commands
Hive and HiveQL
 Understand the Hive architecture
 Why need for another data warehousing system
 Installing, congifuring and running Hive
 HiveQL - Importing data, sorting and aggregating, joins, map joins
 Writing join queries and inserting data back into Hive
 Understand how queries are converted into MapReduce jobs
 Hive Tables and storage formats
 UDF and UDAF
 Choosing between PIG, Hive and Impala
Zookeeper
 Overview of Zookeeper
 Uses of Zookeeper
 Zookeeper Service
 Zookeeper Data Model
 Building applications with Zookeeper
Sqoop
 Overview of Sqoop
 Where is Sqoop used - import/export structured data
 Using Sqoop to import data from RDBMS into HDFS
 Using Sqoop to import data from RDBMS into Hive
 Using Sqoop to import data from RDBMS into HBase
 Using Sqoop to export data from HDFS into RDMBS
 Sqoop connectors
Flume
 Overview of Flume
 Where is Flume used - import/export unstructured data
 Using Flume to load data into HDFS
 Using Flume to load data into HBase
 Using Flume to load data into Hive
HBase
 Introduction to HBase
 Why use HBase
 HBase Architecture - read and write paths
 HBase vs RDBMS
 Installing and Configuration
 Schema design in HBase - column families, hotspotting
 Accessing data with HBase API - Reading, Adding, Updating data from the shell, JAVA API
 SCAN and Advanced API
 Using Zookeeper with HBase
Cassandra and MongoDB
 Introduction to NoSQL database
 Advantage of NoSQL vs traditional RDBMS
 Introduction to Apache Cassandra
 Overview of Cassandra - data model, reading/writing data, CQL
 Introduction to MongoDB
 MongoDB vs Cassandra
 Introduction to Mahout
Apache Oozie
 Introduction to Oozie
 Oozie workflow jobs
 Oozie coordinator jobs
 Creating Oozie Workflows
 Using HUE UI for Oozie
 Using CLI to run and track workflows
Hadoop 2.0, YARN, MRv2
 Understand new features in Hadoop 2.0
 Learn advanced Hadoop concepts
 Introduction to YARN
 YARN architecture
 Upgrading MRv1 to MRv2
 Developing application using MapReduce version 2
  •  Blended learning with instructor-led-online classrrom sessions and online self learning.
  •  Course completion certificate to all the participants
  •  Project on Big Data and Hadoop development
  •  Downloadable e-book for future references
  •  Big Data and Hadoop simulation papers
  •  Java essentials for Hadoop included

Who are the Instructors?

All our instructors are working professionals and experts in Big Data and Hadoop Development. They have real world experience in Big Data and Hadoop.

How will be the practical done?

All our instructors are working professionals and experts in Big Data and Hadoop Development. They have real world experience in Big Data and Hadoop.

Will I get a project to complete?

Yes, towards the end of the training, you will get a project to complete. Once you submit the project, the instructor will validate the project and then you will get the course completion certificate. This project will help you in understanding how the different components are related to each other and how is the data flow between different components.

What are the system requirements to install Hadoop environment?

Your system should have 4GB RAM, a processor better than core 2 duo. In case, your system falls short of these requirements, we can provide you remote access to our Hadoop Cluster.

I have a windows system. Can that be used to work on the Hadoop assignments?

Absolutely yes! One can always use Windows to work on Hadoop. You need to install Oracle Virtual Box on your Windows machine and then you can import our Virtual Machine in it, which we will provide you.

Can I Install Hadoop on my Mac Machine?

Yes, our Virtual Machine can be installed on Mac machine also.

What internet speed is required to attend the LIVE classes?

1 Mbps of internet speed is preferable to attend the LIVE classes. However, we have seen people attending the classes from a much slower internet speed.

What if I have queries after I complete this course?

Once you join the course, you will get lifetime support. Even after the course completion, you can get back to the support team for any queries that you may have.

Do you provide any Certification? If yes, what is the Certification process?

Yes, we provide our own Certification. At the end of your course, you will work on a real time Project. You will receive a Problem Statement along with a data-set to work. Once you are successfully through the project (Reviewed by an Expert), you will be awarded a certificate with a performance-based grading.

I have around 8 years of experience in software development. What are the career prospects in Hadoop?

Hadoop is one of the hottest career options available today for Software Engineers. There are around 12,000 jobs currently in U.S. alone for Hadoop Developers and demand for Hadoop Developers is far more than the availability.