• About Us

Hadoop Online Training

Hadoop Online Training Course Content

The Motivation for Hadoop
✔ Problems with traditional large-scale systems
✔ Data Storage literature survey
✔Data Processing literature Survey
✔Network Constraints
✔Requirements for a new approach
✔Hadoop: Basic Concepts
✔What is Hadoop?
✔The Hadoop Distributed File System
✔Hadoop Map Reduce Works
✔Anatomy of a Hadoop Cluster
✔Hadoop demons
✔Master Daemons
✔Name node
✔Job Tracker
✔Secondary name node
✔Slave Daemons
✔Job tracker
✔Task tracker
✔HDFS(Hadoop Distributed File System)
✔Blocks and Splits
✔Input Splits
✔HDFS Splits
✔Data Replication
✔Hadoop Rack Aware
✔Data high availability
✔Cluster architecture and block placement
CASE STUDIES
Programming Practices & Performance Tuning
✔Developing MapReduce Programs in ✔Local Mode
Running without HDFS
✔Pseudo-distributed Mode
Running all daemons in a single node
✔Fully distributed mode
Running daemons on dedicated nodes
Hadoop Administration :
Setup Hadoop cluster of Apache, Cloudera, Hortonworks, Greenplum
✔Make a fully distributed Hadoop cluster on a single laptop/desktop
✔Install and configure Apache Hadoop on a multi node cluster in lab.
✔Install and configure Cloudera Hadoop distribution in fully distributed mode
✔Install and configure Horton Works Hadoop distribution in fully distributed mode
✔Install and configure Green Plum distribution in fully distributed mode
✔Monitoring the cluster
✔Getting used to management console of Cloudera and Horton Works
✔Name Node in Safe mode
✔Meta Data Backup
✔Ganglia and Nagios – Cluster monitoring
✔CASE STUDIES
Hadoop Development :
Writing a MapReduce Program
✔Examining a Sample MapReduce Program
✔With several examples
✔Basic API Concepts
✔The Driver Code
✔The Mapper
✔The Reducer
✔Hadoop’s Streaming API
Performing several Hadoop jobs
✔The configure and close Methods
✔Sequence Files
✔Record Reader
✔Record Writer
✔Role of Reporter
✔Output Collector
✔Counters
✔Directly Accessing HDFS
✔ToolRunner
✔Using The Distributed Cache
Several MapReduce jobs (In Detailed)
✔MOST EFFECTIVE SEARCH USING MAPREDUCE
✔GENERATING THE RECOMMENDATIONS USING MAPREDUCE
✔PROCESSING THE LOG FILES USING MAPREDUCE
✔Identity Mapper
✔Identity Reducer
✔Exploring well known problems using MapReduce applications
Debugging MapReduce Programs
✔Testing with MRUnit
✔Logging
✔Other Debugging Strategies.
Advanced MapReduce Programming
✔ The Secondary Sort
✔Customized Input Formats and Output Formats
✔Joins in MapReduce
Monitoring and debugging on a Production Cluster
✔Counters
✔Skipping Bad Records
✔Running in local mode
Tuning for Performance in MapReduce
✔Reducing network traffic with combiner
✔Partitioners
✔Reducing the amount of input data
✔Using Compression
✔Reusing the JVM
✔Running with speculative execution
✔Other Performance Aspects
✔CASE STUDIES
CDH4 Enhancements :
✔Name Node High – Availability
✔Name Node federation
✔Fencing
✔MapReduce Version – 2
HADOOP ANALYST
Hive
✔Hive concepts
✔Hive architecture
✔Install and configure hive on cluster
✔Different type of tables in hive
✔Hive library functions
✔Buckets
✔Partitions
✔Joins in hive
✔Inner joins
✔Outer Joins
✔Hive UDF
PIG
✔Pig basics
✔Install and configure PIG on a cluster
✔PIG Library functions
✔Pig Vs Hive
✔Write sample Pig Latin scripts
✔Modes of running PIG
✔Running in Grunt shell
✔Running as Java program
✔PIG UDFs
✔Pig Macros
✔Debugging PIG
IMPALA
✔Difference between Impala Hive and Pig
✔How Impala gives good performance
✔Exclusive features of Impala
✔Impala Challenges
✔Use cases of Impala
NOSQL
✔HBase
✔HBase concepts
✔HBase architecture
✔HBase basics
✔Region server architecture
✔File storage architecture
✔Column access
✔Scans
✔HBase use cases
✔Install and configure HBase on a multi node cluster
✔Create database, Develop and run sample applications
✔Access data stored in HBase using clients like Java, Python and Pearl
✔Map Reduce client to access the HBase data
✔HBase and Hive Integration
✔HBase admin tasks
✔Defining Schema and basic operation.
✔Cassandra Basics
✔MongoDB Basics
Other EcoSystem Components
✔Sqoop
✔Install and configure Sqoop on cluster
✔Connecting to RDBMS
✔Installing Mysql
✔Import data from Oracle/Mysql to hive
✔Export data to Oracle/Mysql
✔Internal mechanism of import/export
Oozie
✔Oozie architecture
✔XML file specifications
✔Install and configuring Oozie and Apache
✔Specifying Work flow
✔Action nodes
✔Control nodes
✔Oozie job coordinator
Flume, Chukwa, Avro, Scribe, Thrift
✔Flume and Chukwa concepts
✔Use cases of Thrift, Avro and scribe
✔Install and configure flume on cluster
✔Create a sample application to capture logs from Apache using flume
Hadoop Challenges
✔Hadoop disaster recovery
✔Hadoop suitable cases