Thursday, April 2, 2015
01:15 PM - 04:30 PM
Please Note: This session continues to 11:45 am on Friday morning
In this course you will learn about Hadoop tools for Big Data analytics: Hive, Pig, HBase and Spark. These four technologies are parts of every Hadoop distribution. They are practical tools that data practitioners can use and quickly become productive. We present them in a comparable way, so their features can be contrasted and compared.
This course is oriented towards data professionals. Therefore, in this course we do not cover Java programming for Hadoop or mathematics for advanced data analysis. Our goal is to give data professionals a practical starting point for exploration of data analytics tools with Hadoop. After the course, you can power up your favorite Hadoop distribution and start with Big Data analytics on your own.
Working with Hive
- What is Hive?
- Hive architecture
- Data warehouse using Hive
Working with Pig
- What is Pig?
- Analyzing data using Pig
- Using Pig Latin to build data analysis programs
Working with Spark
- What is Spark?
- Advantages of Spark over Hadoop
- Using Spark for small data analytics
"Cannot say enough good things about the entire class."
"Good presentation for Hadoop newbies!"
Dr. Vladimir Bacvanski has over two decades of engineering experience with mission critical and distributed enterprise systems and data technologies. Vladimir has helped a number of companies including the US Treasury, the Federal Reserve Bank, the US Navy, IBM, Dell, Hewlett Packard, JP Morgan Chase, General Electric, BAE Systems, AMD, and others to select, transition to, and apply new software and data technologies.
Vladimir is published worldwide and is a keynote speaker, session chair, and workshop organizer at leading industry events. As a founder of SciSpike, Vladimir is focusing on Big Data technologies and highly scalable reactive software architectures with node.js and Scala. Vladimir is the author of the O'Reilly course on Big Data and NoSQL.