Intro on Drill: Self-Service Data Exploration and Nested Data Analytics on Hadoop

SQL is one of the most widely used languages to access, analyze, and manipulate structured data. As Hadoop gains traction within enterprise data architectures across industries, the need for SQL for both structured and loosely-structured data on Hadoop is growing rapidly Apache Drill started off with the audacious goal of delivering consistent, millisecond ANSI SQL query capability across wide range of data formats.

At a high level, this translates to two key requirements – Schema Flexibility and Performance. Apache Drill provides the users the ability to interact with big data on Hadoop much faster and far more easily using the familiar SQL language. Users are no longer dependent on central IT teams and DBAs to produce schemas and then maintain them when the structure changes for a few records. Drill alleviates the pain associated with structuring unstructured data before one gains any insights by providing a simple mechanism to query any dataset on Hadoop - be it flat files, parquet or JSON files or tables within an HBase table.

This session will give you an overview of several different use cases that enterprises are testing Drill for.

Keys Botzum is Senior Principal Technologist with MapR Technologies, where he wears many hats. His primary responsibility is interacting with customers in the field, but he also teaches classes, contributes to documentation, and works with engineering teams. He has over 15 years of experience in large scale distributed system design. Previously, he was a Senior Technical Staff Member with IBM, and a respected author of many articles on the WebSphere Application Server as well as a book. When not wearing one of his MapR hats, Keys enjoys time with friends and family, and getting outside to play tennis and hike. He holds a Masters degree in Computer Science from Stanford University and a B.S. in Applied Mathematics/Computer Science from Carnegie Mellon University.