Big Data Analytics using the Hadoop Ecosystem

Big Data Analytics using the Hadoop Ecosystem
Start Date20th June 2016
Course CodeSS16-52
Full Fee
Duration2 Days
Network Member Subsidised Fee€860.00
Download Course PDF  
  Course expired

Programme Overview
Apache Hadoop is an open source software project that enables distributed processing of large data sets across clusters of commodity servers.

It is a framework designed to scale up from a single server to thousands of machines, with very high degree of fault tolerance. Rather than relying on high-end hardware, the resiliency of these clusters comes from the software's ability to detect and handle failures at the application layer.
Course Objectives and Learning Outcomes
• Understand the concepts of Big Data processing
• Explore the components of the Apache Hadoop framework
• Learn the APIs for building services
• Produce and consume data in single and multi-node clusters
• Understand best practices and performance issues
Who should attend
This course is designed for staff who want to begin exploiting data analytic techniques and tools.
Course Content
Introduction to Big Data
What is big data? Big data analytics. Batch vs. real-time analytics. Using Hadoop for batch analytics.

Introducing Hadoop
Overview of the Hadoop infrastructure: Avro, Pig, Sqoop, Zookeeper etc., case studies of Hadoop in use.

Functional Programming Primer
Concepts of functional programming, Lambdas, modern programming language support, immutability and statelessness, introduction of the case study used for the remainder of the course

Introducing HDFS concepts and architecture, HDFS daemons, file read / write operations.

Introducing MapReduce and its relationship to functional programming theory, Java/Python map reduce and combination methods, build a series of components to assist in analysis

Job Scheduling
Scheduling linearly or as a directed acyclic graph (DAG). Scheduling sequential jobs using the MapReduce library.

Introducing Oozie
Supporting complex and interdependent workflows with Oozie, Oozie workflow overview, defining workflows in XML.

Introducing Sqoop
Transferring data between Hadoop and relational databases, Sqoop tools and command aliases.

Introducing Hive
The Hive data warehousing infrastructure, Hive data units, Hive type system, built in operators and functions

Introducing Pig
The Pig architecture, execution modes, the Grunt shell

About the Trainer

Neueda has been delivering training solutions since 2002 and our experienced and passionate specialists love to share and teach in a hands-on, collaborative environment. Our instructors have the expertise and experience building large-scale software solutions using the technologies they write and speak about.

Deirdre Geary is an experienced Oracle, Microsoft SQL and .NET technical trainer who delivers customised training courses focusing on Oracle Application Express and the Microsoft platforms – she covers new features, introductory, intermediate and administration. She also covers all levels of SQL and PL/SQL.

She is accredited by both Oracle and Microsoft, covering all their SQL, PL/SQL and developer tuning. Deirdre also teaches Oracle's Big Data and NoSQL courses and seminars and specialises in Data Visualisation.