Hadoop Course Content - Keylabs training

Hadoop Online Training in Hyderabad and Bangalore, Job support‎ > ‎

Hadoop Course Content

I. Introduction to Big Data and Hadoop

What is Big Data?
What are the challenges for processing big data?
What technologies support big data?
3V’s of BigData and Growing.
What is Hadoop?
Why Hadoop and its Use cases
History of Hadoop
Different Ecosystems of Hadoop.
Advantages and Disadvantages of Hadoop
Real Life Use Cases

II. HDFS (Hadoop Distributed File System)

HDFS architecture
Features of HDFS
Where does it fit and Where doesn't fit?
HDFS daemons and its functionalities
Name Node and its functionality
Data Node and its functionality
Secondary Name Node and its functionality
Data Storage in HDFS
Introduction about Blocks
Data replication
Accessing HDFS
CLI(Command Line Interface) and admin commands
Java Based Approach
Hadoop Administration
Hadoop Configuration Files
Configuring Hadoop Domains
Precedence of Hadoop Configuration
Diving into Hadoop Configuration
Scheduler
RackAwareness
Cluster Administration Utilities
Rebalancing HDFS DATA
Copy Large amount of data from HDFS
FSImage and Edit.log file theoretically and practically.

III. MAPREDUCE

Map Reduce architecture

JobTracker , TaskTracker and its functionality
Job execution flow
Configuring development environment using Eclipse
Map Reduce Programming Model
How to write a basic Map Reduce jobs
Running the Map Reduce jobs in local mode and distributed mode
Different Data types in Map Reduce
How to use Input Formatters and Output Formatters in Map Reduce Jobs
Input formatters and its associated Record Readers with examples
Text Input Formatter
Key Value Text Input Formatter
Sequence File Input Formatter
How to write custom Input Formatters and its Record Readers
Output formatters and its associated Record Writers with examples
Text Output Formatter
Sequence File Output Formatter
How to write custom Output Formatters and its Record Writers
How to write Combiners, Partitioners and use of these
Importance of Distributed Cache
Importance Counters and how to use Counters

Advance MapReduce Programming

Joins - Map Side and Reduce Side

Use of Secondary Sorting
Importance of Writable and Writable Comparable Api's
How to write Map Reduce Keys and Values
Use of Compression techniques
Snappy, LZO and Zip
How to debug Map Reduce Jobs in Local and Pseudo Mode.
Introduction to Map Reduce Streaming and Pipes with examples
Job Submission
Job Initialization
Task Assignment
Task Execution
Progress and status bar
Job Completion
Failures
Task Failure
Tasktracker failure
JobTracker failure
Job Scheduling
Shuffle & Sort in depth
Diving into Shuffle and Sort
Dive into Input Splits
Dive into Buffer Concepts
Dive into Configuration Tuning
Dive into Task Execution
The Task assignment Environment
Speculative Execution
Output Committers
Task JVM Reuse
Multiple Inputs & Multiple Outputs
Build In Counters
Dive into Counters – Job Counters & User Defined Counters
Sql operations using Java MapReduce
Introduction to YARN (Next Generation Map Reduce)

IV. Apache HIVE

Hive Introduction
Hive architecture
Driver
Compiler
Semantic Analyzer
Hive Integration with Hadoop
Hive Query Language(Hive QL)
SQL VS Hive QL
Hive Installation and Configuration
Hive, Map-Reduce and Local-Mode
Hive DLL and DML Operations
Hive Services
CLI
Schema Design
Views
Indexes
Hiveserver

Metastore

embedded metastore configuration
external metastore configuration
Transformations in Hive
UDFs in Hive
How to write a simple hive queries
Usage
Tuning
Hive with HBASE Integration
Need to add some more R&D done by myself

V. Apache PIG

Introduction to Apache Pig

Map Reduce Vs Apache Pig

SQL Vs Apache Pig
Different data types in Pig
Modes Of Execution in Pig
Local Mode
Map Reduce Mode
Execution Mechanism
Grunt Shell
Script
Embedded
Transformations in Pig
How to write a simple pig script
UDFs in Pig
Pig with HBASE Integration
Need to add some more R&D done by myself

VI. Apache SQOOP

Introduction to Sqoop
MySQL client and Server Installation
How to connect to Relational Database using Sqoop
Sqoop Commands and Examples on Import and Export commands.
Transferring an Entire Table
Specifying a Target Directory
Importing only a Subset of data
Protecting your password
Using a file format other than CSV
Compressing Imported Data
Speeding up Transfers
Overriding Type Mapping
Controlling Parallelism
Encoding Null Values
Importing all your tables
Incremental Import
Importing only new data
Incrementing Importing Mutable data
Preserving the last imported value
Storing Password in the Metastore
Overriding arguments to a saved job
Sharing the MetaStore between sqoop client
Importing data from two tables
Using Custom Boundary Queries
Renaming Sqoop Job instances
Importing Queries with duplicate columns
Transferring data from Hadoop
Inserting Data in Batches
Exporting with All or Nothing Semantics
Updating an Existing Data Set
Updating or Inserting at the same time
Using Stored Procedures
Exporting into a subset of columns
Encoding the Null Value
Encoding the Null Value Differently
Exporting Corrupted Data

VII. Apache FLUME

Introduction to flume
Flume agent usage

VIII Apache Hbase

Hbase introduction
Hbase basics
Column families
Scans
Hbase installation
Hbase Architecture
Storage
WriteAhead Log
Log Structured MergeTrees
Mapreduce integration
Mapreduce over Hbase
Hbase Usage
Key design
Bloom Filters
Versioning
Filters
Hbase Clients
REST
Thrift
Hive
Web Based UI
Hbase Admin
Schema definition
Basic CRUD operations
Apache OOZIE
Introduction to Oozie
Executing workflow jobs

X. Hadoop Installation on Linux, All other ecosystems installations on Linux.

XI. Cluster setup (200 Nodes cluster) knowledge sharing with setup document.

XII. Cloudera & Hortonworks