November 24, 2019

201 words 1 min read

BoFs: Data-Aware Scheduling in Kubernetes [I]

BoFs: Data-Aware Scheduling in Kubernetes [I]

In order to provide prompt results and efficiently deal with data-intensive workloads, Big Data applications execute their jobs on compute slots across large clusters. Also, for optimal performance, t …


Talk Title	BoFs: Data-Aware Scheduling in Kubernetes [I]
Speakers	Felix Hupfeld (Founder, Quobyte), Johannes M. Scheuermann (Cloud Platform Engineer, inovex)
Conference	CloudNativeCon + KubeCon Europe
Conf Tag
Location	Berlin Congress Center
Date	Mar 28-30, 2017
URL	Talk Page
Slides	Talk Slides
Video

In order to provide prompt results and efficiently deal with data-intensive workloads, Big Data applications execute their jobs on compute slots across large clusters. Also, for optimal performance, these applications should be as close as possible to the data they use. Data-aware scheduling is the way to achieve that optimization and can conveniently be set up using Kubernetes. We’ll present two different use cases: First, we’ll make use of how Big Data applications like Hadoop and Spark can use their native HDFS protocol for data-aware scheduling. Second, we’ll demonstrate an efficient way to write a data-aware scheduler for Kubernetes that satisfies not just your application’s requirements, but also keeps your admins happy. As a bonus, it’ll also allows us to run data-aware scheduling on applications other than Big Data.

cluster spark hadoop hdfs big data use case optimization performance kubernetes

comments powered by Disqus

Big data for big data: Machine-learning models of Hadoop cluster behavior

Big data for big data: Machine-learning models of Hadoop cluster behavior

November 9, 2019

Sean Suchter and Shekhar Gupta describe the use of very fine-grained performance data from many Hadoop clusters to build a model predicting excessive swapping events.

Compressed linear algebra in Apache SystemML

Compressed linear algebra in Apache SystemML

November 7, 2019

Many iterative machine-learning algorithms can only operate efficiently when a large matrix of training data fits in the main memory. Frederick Reiss and Arvind Surve offer an overview of compressed linear algebra, a technique for compressing training data and performing key operations in the compressed domain that lets you build models over big data with small machines.

Paint the landscape and secure your data center with Apache Spot

Paint the landscape and secure your data center with Apache Spot

November 4, 2019

Cesar Berho and Alan Ross offer an overview of open source project Apache Spot (incubating), which delivers next-generation cybersecurity analytics architecture through unsupervised learning using machine-learning techniques at cloud scale for anomaly detection.

Real-time analytics using Kudu at petabyte scale

Real-time analytics using Kudu at petabyte scale

November 3, 2019

Sridhar Alla and Shekhar Agrawal explain how Comcast built the largest Kudu cluster in the world (scaling to PBs of storage) and explore the new kinds of analytics being performed there, including real-time processing of 1 trillion events and joining multiple reference datasets on demand.

Using R for scalable data analytics: From single machines to Hadoop Spark clusters

Using R for scalable data analytics: From single machines to Hadoop Spark clusters

October 31, 2019

Join in to learn how to do scalable, end-to-end data science in R on single machines as well as on Spark clusters. You'll be assigned an individual Spark cluster with all contents preloaded and software installed and use it to gain experience building, operationalizing, and consuming machine-learning models using distributed functions in R.

Building Distributed TensorFlow Using Both GPU and CPU on Kubernetes [I]

Building Distributed TensorFlow Using Both GPU and CPU on Kubernetes [I]

November 24, 2019

Big Data and Machine Learning have become extremely hot topics in recent years. Google has announced its AI-centric strategy and released the deep learning toolkit TensorFlow. TensorFlow soon became t …