Skip Navigation Links

Course Length:
5 Days
Course Description:
This course combines the following courses into a single course:
          Hadoop for Administrators
          Introduction to Cloud Computing

The description of each course is listed below:

This course provides administrators with the fundamentals required to successfully implement and maintain Hadoop clusters. Students will get an overview of Hadoop and its capabilities and then examine best practices for deploying Hadoop clusters, determining hardware needs, and monitoring Hadoop clusters. Students will also learn how to handle failures of Hadoop components and how to add and remove those components from Hadoop clusters. In addition to exploring how to install Hadoop, students will learn to install other related technologies such as Hive, Pig, and Accumulo.

This course is an introduction to cloud computing. Cloud computing will be defined and students will be taught how to determine the suitability of in-house vs hosted solutions, including determining what the strategic, risk, and financial impact will be. Cloud technologies will be discussed and students will be guided through the steps to choose a solution, calculate costs, and develop deployment and training plans.
Who Should Attend:
This course combines several courses, the descriptions for who should attend are listed below: This course is for Administrators interested in learning how to deploy and manage a Hadoop cluster. This class is for students interested in learning about cloud computing.
Benefits of Attendance:
Upon completion of this course, students will be able to:
  • Set up Hadoop in a cluster and write data analytic programs
  • Present design patterns and practices of programming MapReduce
  • Grasp all the knobs and levers for running Hadoop
  • Write meaningful programs in a MapReduce framework
  • Understand basic concepts of MapReduce applications developed using Hadoop, including framework components
  • Use Hadoop for a variety of data analysis tasks
  • Define cloud computing
  • Describe the benefits of cloud computing
  • Understand the challenges of cloud computing
  • Understand how cloud components fit together
  • Determine the suitability of in-house vs hosted solutions
Prerequisites:
Students should be familiar with Java, since most code examples will be written in Java. Familiarity with basic statistical concepts will help the student with the more advanced data processing examples. Students should also have previous experience with UNIX or Linux. Students should have Network+ experience.
Course Outline:
  • Introducing Hadoop
    1. What is Hadoop
    2. Distributed Systems
    3. SQL Databases
    4. MapReduce
    5. History of Hadoop
  • Starting Hadoop
    1. Building Blocks
    2. Setting up SSH for a Hadoop cluster
    3. Running Hadoop
    4. Web Based Cluster UI
  • Components of Hadoop
    1. The HDFS
    2. Anatomy of a MapReduce program
    3. Reading and Writing
  • Writing MapReduce Programs
    1. The patent data set
    2. Basic Template of a MapReduce Program
    3. Adapting for Hadoops API changes
    4. Streaming in Hadoop
    5. Improving Performance with Combiners
  • Advanced MapReduce
    1. Chaining MapReduce jobs
    2. Joining data from different sources
    3. Creating a Bloom Filter
  • Programming Practices
    1. Developing MapReduce Programs
    2. Monitoring and Debugging on a production cluster
    3. Tuning for performance
  • Cookbook
    1. Passing Job Specific parameters to your tasks
    2. Probing for Task Specific information
    3. Partitioning into multiple Output files
    4. Inputting from and Outputting to a database
    5. Keeping Output in sorted order
  • Managing Hadoop
    1. Setting up parameter values for practical use
    2. Setting permissions
    3. Managing quotas
    4. Removing DataNodes
    5. Adding DataNodes
    6. Managing NameNodes
    7. Recovering from a failed NameNode
    8. Designing Network Layout and rack awareness
    9. Scheduling jobs from multiple users
  • Running Hadoop in the cloud
    1. Introducing Amazon Web Services
    2. Setting up AWS
    3. Setting up Hadoop on EC2
    4. Runnign MapReduce on EC2
    5. Managing your EC2 instances
    6. Amazon Elastic MapReduce and other AWS
  • Programming with Pig
    1. Thinking like a Pig
    2. Installing Pig
    3. Running Pig
    4. Learning Pig Latin via Grunt
    5. Speaking Pig Latin
    6. Working with user defined functions
    7. Working with scripts
    8. Hive and the Hadoop herd
  • Hive
    1. Other Hadoop related items
  • What is a Cloud?
  • Cloud Architecture
  • Infrastructure as a Service
  • Platform as a Service
  • Software as a Service
  • Benefits and Challenges
  • Strategic Impact
  • Risk Impact
  • Financial Impact
  • Requirements Analysis
  • Draft Architecture
  • Application Inventory
  • Service Components
  • User Profiles
  • End-to-end Design
  • Connectivity
  • Resilience
  • Security
  • Transition Management
  • Migration
  • Employee Changes
  • Service Management
  • Administration
  • Monitoring
  • Support
  • Compliance
  • Risk
  • Governance
  • Systematic Refinement
  • Future Trends
  • Reference Architecture
  • Use Cases
  • Platform Definition
  • Google App Engine, Microsoft Windows Azure, Amazon Web Services, and Other Platform Options
  • Browser Interface
  • Native Clients
  • Presentation Optimization
  • Real-Time Web
  • Authentication
  • Personalization
  • Network and Application Integration
  • Storage
  • Relational and Non-Relational Data
  • Reliability and Elasticity
  • Marketing
  • Payments
  • Development, Testing, and Staging
  • Production
  • Configuration
  • Administration and Troubleshooting
  • Refinement