February 1, 2020

219 words 2 mins read

Accelerating I/O in Big Data A Data Driven Approach and Case Studies

Accelerating I/O in Big Data A Data Driven Approach and Case Studies

The I/O infrastructure is key to Big Data ecosystem. New networking and storage hardware technologies are continuously being developed while software I/O stack remains relatively slow. In order to ens …


Talk Title	Accelerating I/O in Big Data A Data Driven Approach and Case Studies
Speakers	Yingqi (Lucy) Lu (Software Development Engineer, Intel Corporation)
Conference	Open Source Summit North America
Conf Tag
Location	Vancouver, BC, Canada
Date	Aug 27-31, 2018
URL	Talk Page
Slides	Talk Slides
Video

The I/O infrastructure is key to Big Data ecosystem. New networking and storage hardware technologies are continuously being developed while software I/O stack remains relatively slow. In order to ensure applications are able to take full advantage of modern devices, deep understanding of I/O subsystems and optimizations to Java libraries and Big Data frameworks are required. In this presentation, a data driven approach is used to identify software I/O bottlenecks inside four Big Data frameworks - Apache Cassandra, HBase, Spark and HDFS. To fix the bottlenecks, new Java library APIs Intel contributes to OpenJDK are introduced. Corresponding software changes to the target Big Data frameworks are also discussed in the presentation as examples of how to use the new Java APIs. At the end of each case study, performance analysis is used to demonstrate throughput and latency improvements from the software optimizations.

api data driven intel java apache framework performance spark ecosystem infrastructure network big data hdfs optimization cassandra hardware networking

comments powered by Disqus

Tutorial: Edge is the Catalyst for Cloud Ready / 5G

Tutorial: Edge is the Catalyst for Cloud Ready / 5G

January 22, 2020

Join us for this tutorial which will address the following:1. Edge Datacenter (NGCO)Presented by Rory Browne, IntelOver the next decade, Next Generation Central Office will be a key strategic location …

Cuttlefish: Lightweight primitives for online tuning

Cuttlefish: Lightweight primitives for online tuning

November 28, 2019

Tomer Kaftan offers an overview of Cuttlefish, a lightweight framework prototyped in Apache Spark that helps developers adaptively improve the performance of their data processing applications by inserting a few library calls into their code. These calls construct tuning primitives that use reinforcement learning to adaptively modify execution as they observe application performance over time.

Smart agriculture: Blending IoT sensor data with visual analytics

Smart agriculture: Blending IoT sensor data with visual analytics

November 21, 2019

Mike Prorock offers an overview of mesur.io, a game-changing climate awareness solution that combines smart sensor technology, data transmission, and state-of-the-art visual analytics to transform the agricultural and turf management market. Mesur.io enables growers to monitor areas of concern, providing immediate benefits to crop yield, supply costs, farm labor overhead, and water consumption.

Fast analytics on fast data: Kudu as a storage layer for banking applications

Fast analytics on fast data: Kudu as a storage layer for banking applications

December 9, 2019

Olaf Hein explains how a large German bank relies on a Kudu-based data platform to speed up business processes. Olaf highlights key data access patterns and the system architecture and shares best practices and lessons learned using Kudu in development and operations.

Making Big Data Processing Portable. The Story of Apache Beam and gRPC

Making Big Data Processing Portable. The Story of Apache Beam and gRPC

December 7, 2019

Big data applications have been an almost exclusive domain of Java and Scala developers. This not only frustrates engineers who prefer other languages and their ecosystems, but also impedes companies …

The ultimate data scientist's playground: Building a multipetabyte analytic infrastructure for cyber defense

The ultimate data scientist's playground: Building a multipetabyte analytic infrastructure for cyber defense

December 5, 2019

Lee Blum offers an overview of Verint's large-scale cyber-defense system built to serve its data scientists with versatile analytic operations on petabytes of data and trillions of records, covering the company's extremely challenging use case, decision considerations, major design challenges, tips and tricks, and the systems overall results.