December 16, 2019

332 words 2 mins read

Beyond Hadoop at Yahoo: Interactive analytics with Druid

Beyond Hadoop at Yahoo: Interactive analytics with Druid

Himanshu Gupta explains why Yahoo has been increasingly investing in interactive analytics and how it leverages Druid to power a variety of internal- and external-facing data applications.


Talk Title	Beyond Hadoop at Yahoo: Interactive analytics with Druid
Speakers
Conference	Strata + Hadoop World
Conf Tag	Make Data Work
Location	New York, New York
Date	September 27-29, 2016
URL	Talk Page
Slides	Talk Slides
Video

Yahoo initially built Hadoop as an answer to a very acute pain around efficiently storing and processing large volumes of data. Since Yahoo open sourced Hadoop, it has become widely adopted in the technology world. However, time has taught us that when a system becomes extremely popular for solving one class of problems, its limitations in solving other problems become more apparent. Himanshu Gupta explains why Yahoo has been increasingly investing in interactive analytics and how it leverages Druid to power a variety of internal- and external-facing data applications. Millions of users around the globe interact with Yahoo through their web browsers and mobile devices, and these interactions generate billions of events every day. As Yahoo’s data volumes have grown, it’s faced increasing demand to make the data more accessible, both to internal users and to its customers. Not all of Yahoo’s end users are backend analysts, and many have no prior experience with traditional analytic tools, so Yahoo wanted to build simple, interactive data applications that anyone could use to derive insights from data. To support these use cases, Yahoo elected to invest in the Druid open source project. Today, Yahoo has multiple Druid clusters to support analytics for a variety of use cases, such as application performance, user activity, ads metrics, and many more. Each demands that Yahoo’s data applications update in real time and handle interactive ad hoc querying at a very high scale. Himanshu explores Yahoo’s use cases with Druid, shares the lessons learned from scaling Druid deployment, monitoring clusters, and ingesting data, and offers strategies for accelerating queries by leveraging approximate sketch-based algorithms.

metrics performance algorithm hadoop open source analytics mobile use case monitoring analytic tools cluster

comments powered by Disqus

Stream analytics in the enterprise: A look at Intels internal IoT implementation

Stream analytics in the enterprise: A look at Intels internal IoT implementation

December 10, 2019

Moty Fania shares Intels IT experience implementing an on-premises IoT platform for internal use cases. The platform was designed as a multitenant platform with built-in analytical capabilities and based on open source big data technologies and containers. Moty highlights the lessons learned from this journey with a thorough review of the platforms architecture.

Stream analytics in the enterprise: A look at Intels internal IoT implementation

Stream analytics in the enterprise: A look at Intels internal IoT implementation

November 17, 2019

Moty Fania shares Intels IT experience implementing an on-premises IoT platform for internal use cases. The platform was based on open source big data technologies and containers and was designed as a multitenant platform with built-in analytical capabilities. Moty highlights the key lessons learned from this journey and offers a thorough review of the platforms architecture.

Architecting for change: LinkedIn's new data ecosystem

Architecting for change: LinkedIn's new data ecosystem

December 16, 2019

Shirshanka Das and Yael Garten describe how LinkedIn redesigned its data analytics ecosystem in the face of a significant product rewrite, covering the infrastructure changes, such as client-side activity tracking, a unified reporting platform, and data virtualization techniques to simplify migration, that enable LinkedIn to roll out future product innovations with minimal downstream impact.

How a Spark-based feature store can accelerate big data adoption in financial services

How a Spark-based feature store can accelerate big data adoption in financial services

December 12, 2019

Kaushik Deka and Phil Jarymiszyn discuss the benefits of a Spark-based feature store, a library of reusable features that allows data scientists to solve business problems across the enterprise. Kaushik and Phil outline three challenges they facedsemantic data integration within a data lake, high-performance feature engineering, and metadata governanceand explain how they overcame them.

Apache Eagle: Secure Hadoop in real time

Apache Eagle: Secure Hadoop in real time

November 21, 2019

Apache Eagle is an open source monitoring solution to instantly identify access to sensitive data, recognize malicious activities, and take action. Arun Karthick Manoharan, Edward Zhang, and Chaitali Gupta explain how Eagle helps secure a Hadoop cluster using policy-based and machine-learning user-profile-based detection and alerting.

IoT in the enterprise: A look at Intel (IoT) Inside

IoT in the enterprise: A look at Intel (IoT) Inside

October 23, 2019

Moty Fania shares Intels IT experience implementing an on-premises big data IoT platform for internal use cases. This unique platform was built on top of several open source technologies and enables highly scalable stream analytics with a stack of algorithms such as multisensor change detection, anomaly detection, and more.