Big Data with Hadoop

Hadoop is an open source, Java-based programming framework that supports the processing and storage of extremely large data sets in a distributed computing environment. It is part of the Apache project sponsored by the Apache Software Foundation.


Why Choose Jenrac?

  • Flexible Instalment Plans for all the courses, according to your need. (Click here to contact us and get free quote and consultation about study programs).
  • Free Project experience.( Click here for More Information)
  • Highly experienced trainers (Get courses from our tutors with years of industry and academic experience, gained at high technical support during and after completion of your training.)
  • Free Certification Preparation Material.
  • Free Up to Date Courses Material.
  • Online/ On-Site/ Class room/ and Customised One to One training
  • Full flexibility regarding study timing.
  • Full training support from start to finish (CV review according to required industry standards, one to one advice and personal training).
  • Guaranteed success.
  • Job focused approach.

Course Overview

This course helps us to know more about Big data and Hadoop Eco systems. It clearly explains storage of data in HDFS(hadoop distributive file system), Creating Map Reduce functionalities using programming languages and also basic command of Hadoop to process and analyse Big Data.

By the end of this training you will:
- Understand the Types of Tools in Big data. Architectural and functional view of Hadoop.
- Be able to apply the knowledge learned to progress in your career as Big Data Developer/ Consultant.

This course requires a basic knowledge in Linux/ Unix Bash commands or any programming languages like Java/ Python. How ever we explain linux basic commands so poeple with nill knowledge in big data can also learn this course with out hurdles.

Classroom Training: An Instructor led training in our dynamic learning environment based in our office at West London. The classroom is fitted with all the essential amenities needed to ensure a comfortable training experience and with this training you will have an opportunity to build a Networking with other learners, share experiences and develop social interaction.

Online: Unlike most organisations our online based training is a tutor led training system similar to the classroom based training in every given aspect making it more convenient to the students from any location around the world and also cost effective.

Onsite: This training is specifically made for the Corporate clients who wish to train their staff in different technologies. The clients are given an opportunity where they can tailor the duration of course according to their requirements and the training can be delivered in house/ at your location of choice or online.

Customised one to one: A tailored course for students looking for undeterred attention from the tutor at all the times. The duration of course and contents of the course are specifically customised to suite the students requirements. In addition to it the timings of the trainings can also be customised based on the availability of both the tutor as well as the student.

3 Days

Course Preview

• What is Big data
• How is it Evolved
• Four Dimensions (Four V's of big data)
• Use cases of big data
• Different Tools to process big data

• What is Hadoop?
• Components of Hadoop eco system.
• Why Hadoop?
• Industrial usage of Hadoop Eco systems.
• Installation and configuration of Hadoop.
• Types of Hadoop platforms.

• HDFS Introduction
• HDFS layout
• Importance of HDFS in Hadoop
• HDFS Features
• Storage aspects of HDFS
• Blocks in Hadoop
• Configuring block size
• Difference between Default and Configurable Block size
• Design Principles of Block Size
• HDFS Architecture
• HDFS Daemons and its Functionalities
• NameNode
• Secondary Name Node
• DataNode
• HDFS Use cases
•More detailed explanation about Configuration files.
•Metadata, FS image, Edit log, Secondary Name Node and Safe Mode.

• What is Map Reduce?
• Map Reduce Use cases?
• Map Reducing Functionalities
• Importance of Map Reduce in Hadoop?
• Processing Daemons of Hadoop
» Job Tracker
» Task Tracker

• Input Split
» Role of Input Split in Map Reduce
» InputSplit Size Vs Block Size
» InputSplit Vs Mappers
• How to write a basic Map Reduce Program
» Driver Code
» Mapper Code
» Reducer Code
• Driver Code
- Importance of Driver Code in a Map Reduce program
- How to Identify the Driver Code in Map Reduce program
- Different sections of Driver code
• Mapper Code
- Importance of Mapper Phase in Map Reduce
- How to Write a Mapper Class?
- Methods in Mapper Class
• Reducer Code
- Importance of Reduce phase in Map Reduce
- How to Write Reducer Class?
- Methods in Reducer Class
•Input and output Format's in Map Reduce
• Map Reduce API(Application Programming Interface)
- New API
- Depreciated API
• Combiner in Map Reduce
- Importance of combiner in Map Reduce
- How to use the combiner class in Map Reduce?
- Performance tradeoffs with respects to Combiner
• Partitioner in Map Reduce
- Importance of Partitioner class in Map Reduce
- How to use the Partitioner class in Map Reduce
- hash Partitioner functionality
- How to write a custom Partitioner
• Joins - in Map Reduce
- Map Side Join
- Reduce Side Join
- Performance Trade Off
• How to debug MapReduce Jobs in Local and Pseudo cluster Mode.
• Introduction to MapReduce Streaming
• Data localization in Map Reduce
• Secondary Sorting Using Map Reduce
• Job Scheduling

• Introduction to Pig
• Basic commands in Pig
• Installation
• Use cases
• Architecture and functionality

• Introduction to Hive/Hiveql
• Installation of Hive
• Difference between Hive and SQL
• Hive Architecture and Use cases
• Explanation of Data Types in Hive

• Introduction to Sqoop
• Installation
• Basic Commands in Sqoop
• Usage of Sqoop in Data Tranfer
• Sqoop Functionality and Architecture
• Sqoop Export and Import Queries

• Introduction to different file systems in Big data
• Use cases
• Types of Data in Real time
• File structures and Size of Files

• Introduction and use cases for below tools
• Kafka
• Flume

Our Approach:

We give students our top priority and always ensure that every student is given the best possible training. In order to provide the best training, all our training modes have been made interactive sessions. Out of all the 4 training modes, the students are given an opportunity to choose a mode of training depending on their requirements. Different training methods have been introduced for individuals as well as for corporates. Unlike most of the online trainings today, Our Online trainings are interactive sessions and are similar to our classroom trainings. The student will be connecting to our Live virtual classroom where they will be able interact with the trainer.

We at Jenrac Technologies have a unique methodology & approach for our corporate clients. If you are a corporate & looking to train your team. You can contact us over the phone and talk to one of our expert customer service representative. Our customer service representatives are trained and qualified to answer all of your queries right away. You can also fill the contact us form on the side and we will arrange a meeting for you in your premises with one of our expert. We will visit you in person and can explain you in depth about our training programmes, structure and fees.

We provide one of the best professional trainings within SAP in the industry. The courses are run by experts with ample industry experience on this subject matter. The course run are well up to professional standards with the latest industry updates. Contact our team at Jenrac Technologies for all your queries.