Yang Li

Apache Kylin from eBay: Extreme OLAP engine for Hadoop

Date: This event took place live on November 11 2015

Presented by: Yang Li

Duration: Approximately 60 minutes.

Cost: Free

Questions? Please send email to

Description:

Apache Kylin is an open source distributed analytics engine contributed by eBay Inc. that provides SQL interface and multi-dimensional analysis (OLAP) on Hadoop, supporting extremely large datasets. Kylin's pre-built MOLAP cubes, distributed architecture, and high concurrency helps users analyze multidimensional queries using Kylin's SQL interface as well as via other BI tools like Tableau and MicroStrategy. Kylin is successfully deployed and used in eBay for a variety of production use cases, including web traffic analysis and geographical expansion analysis. It was open sourced on Oct 1, 2014 and has 320 stars and 125 forks. Kylin was accepted as an Apache Incubator Project on Nov 25, 2014.

Background

The challenge faced at eBay is that our data volume has become bigger while our user base has become more diverse. Our users—for example, in analytics and business units—consistently ask for minimal latency but want to continue using their favorite tools, such as Tableau and Excel. We worked closely with our internal analytics community and outlined requirements for a successful product at eBay:

  • Sub-second query latency on billions of rows
  • ANSI SQL availability for those using SQL-compatible tools
  • Full OLAP capability to offer advanced functionality
  • Support for high cardinality and very large dimensions
  • High concurrency for thousands of users

Distributed and scale-out architecture for analysis in the TB to PB size range.

We quickly realized nothing met our exact requirements externally—especially in the open-source Hadoop community. To meet our emergent business needs, we decided to build a platform from scratch. With an excellent team and several pilot customers, we have been able to bring the Kylin platform into production as well as open-source it.

Feature highlights

Kylin is a platform offering the following features for big data analytics:

  • Extremely fast OLAP engine at scale — Kylin is designed to reduce query latency on Hadoop for 10+ billion rows of data.
  • ANSI SQL on Hadoop — Kylin supports most ANSI SQL query functions in its ANSI SQL on Hadoop interface.
  • Interactive query capability — Users can interact with Hadoop data via Kylin at sub-second latency—better than Hive queries for the same dataset.
  • MOLAP cube query serving on billions of rows — Users can define a data model and pre-build in Kylin with more than 10+ billion raw data records.
  • Seamless integration with BI Tools — Kylin currently offers integration with business intelligence tools such as Tableau and third-party applications.
  • Open-source ODBC driver — Kylin's ODBC driver is built from scratch and works very well with Tableau. We have open-sourced the driver to the community as well.

In this webcast we will cover:

  • What's Kylin
  • Tech highlights
  • Performance
  • Open source
  • Q & A

About Yang Li

Yang Li is the tech lead for Apache Kylin. He joined eBay-Shanghai in January 2014 as a member of the technical staff, and has been a key developer and architect of the Kylin OLAP engine. Yang also leads the Kylin team of engineers in Shanghai, where they develop the Kylin product and deploy it for eBay as an analytics platform. Prior to eBay, Yang spent eight years with IBM and two years with Morgan Stanley. At IBM, Yang was focused on the core Java library (Apache Harmony), J2EE, and big data engineering development. He was the technical lead at IBM User Technologies and won the Outstanding Technical Achievement Award in 2008. During his time with Morgan Stanley, Yang was the vice president of the Asia Markets team responsible for global regulatory reporting architecture, engine development, and end-to-end production support infrastructure. Twitter: @ApacheKylin


You might also be interested in

Hadoop Fundamentals for Data Scientists
By Jenny Kim, Benjamin Bengfort
January 2015
$119.99 USD
Architectural Considerations for Hadoop Applications
By Mark Grover, Gwen Shapira, Jonathan Seidman, Ted Malaska
March 2015
$59.99 USD