December 27, 2019

216 words 2 mins read

Nezha: A Kubernetes Native Big Data Accelerator For Machine Learning

Nezha: A Kubernetes Native Big Data Accelerator For Machine Learning

Large training datasets used by machine learning frameworks, such as Kubeflow, are usually stored in low cost and high capacity S3 or Google Cloud Storage. However, S3s rating limiting and slow data …

Talk Title Nezha: A Kubernetes Native Big Data Accelerator For Machine Learning
Speakers Huamin Chen (Principal Software Engineer, Red Hat), Yuan Zhou (Senior Software Development Engineer, Intel)
Conference KubeCon + CloudNativeCon North America
Conf Tag
Location Seattle, WA, USA
Date Dec 9-14, 2018
URL Talk Page
Slides Talk Slides
Video

Large training datasets used by machine learning frameworks, such as Kubeflow, are usually stored in low cost and high capacity S3 or Google Cloud Storage. However, S3’s rating limiting and slow data downloading significantly challenges training performance and limits compute scalability. We introduce NeZha and explain how it improves Kubeflow’s training. Nezha is an open source, community driven, and highly collaborative project, contributed by storage and big data engineers. Nezha is based on Kubernetes Initializer: it rewrites Pod spec, adds a sidecar S3 cache, and redirects Pod to use local cache to accelerate. Nezha is self contained and easy to use. It does not require modification to existing applications or user visible Pod changes. Nezha improves big data application performance. Our initial Kubeflow benchmark results using MNIST dataset shows NeZha achieves ~2x speedup.

comments powered by Disqus