Data processing at the speed of 100 Gbps using Apache Crail

Modern networking and storage technologies like RDMA or NVMe are finding their way into the data center. Patrick Stuedi offers an overview of Apache Crail (incubating), a new project that facilitates running data processing workloads (ML, SQL, etc.) on such hardware. Patrick explains what Crail does and how it benefits workloads based on TensorFlow or Spark.


Talk Title	Data processing at the speed of 100 Gbps using Apache Crail
Speakers	Patrick Stuedi (IBM Research)
Conference	Strata Data Conference
Conf Tag	Big Data Expo
Location	San Francisco, California
Date	March 26-28, 2019
URL	Talk Page
Slides	Talk Slides
Video

Once a staple of HPC clusters, today high-performance network and storage devices are everywhere. For a fraction of the cost, you can rent 40/100 Gbps RDMA networks and high-end NVMe flash devices supporting millions of IOPS, tens of GB/s bandwidth, and less than 100 microseconds of latencies. But how do you leverage the speed of high-throughput low-latency I/O hardware in distributed data processing systems like Spark, Flink, or TensorFlow? Patrick Stuedi offers an overview of Apache Crail (incubating) a fast, distributed data store that is designed specifically for high-performance network and storage devices. Crail’s focus is on ephemeral data, such as shuffle data or temporary datasets in complex job pipelines, with the goal of enabling data sharing at the speed of the hardware in an accessible way. From a user perspective, Crail offers a hierarchical storage namespace implemented over distributed or disaggregated DRAM and Flash. At its core, Crail supports multiple storage backends (DRAM, NVMe Flash, and 3D XPoint) and networking protocols (RDMA and TPC/sockets). Patrick explores Crail’s design, use cases, and performance results on a 100 Gbps cluster.

Data processing at the speed of 100 Gbps using Apache Crail

Building high-performance text classifiers on a limited labeling budget

Faster ML over joins of tables

Flyte: Cloud Native Machine Learning & Data Processing Platform

Running High-performance User-space Packet Processing Apps in Kubernetes

RDMA Enabled Kubernetes for High Performance Computing

Intro + Deep Dive: Cloud Native Network Function (CNF) Testbed