December 4, 2019

235 words 2 mins read

Building containerized Spark on a solid foundation with Quobyte and Kubernetes

Building containerized Spark on a solid foundation with Quobyte and Kubernetes

Multiple challenges arise if distributed applications are provisioned in a containerized environment. Daniel Burer and Sascha Askani share a solution for distributed storage in cloud-native environments using Spark on Kubernetes.

Talk Title Building containerized Spark on a solid foundation with Quobyte and Kubernetes
Speakers Daniel Bäurer (inovex GmbH), Sascha Askani (inovex GmbH)
Conference Strata Data Conference
Conf Tag Making Data Work
Location London, United Kingdom
Date May 23-25, 2017
URL Talk Page
Slides Talk Slides
Video

There are many challenges when deploying distributed applications on containers. One of the biggest is the lack of a stable and performant distributed filesystem. HDFS works very well with legacy Hadoop installations on commodity hardware in classic IT environments since it is very cheap to store a large amount of data on your compute nodes (data locality), but cloud-native environments do not allow HDFS to play out its advantages. Data locality on compute nodes, for example, stands contrary to the idea behind containers or cloud infrastructures. For this reason, many cloud-first implementations fall back to object stores like Amazon S3, Google Cloud Storage, or OpenStack Swift for persistence. Those solutions however lack many features of a real filesystem and suffer from low performance due to overhead. Daniel Bäurer and Sascha Askani share a solution using Spark on Kubernetes with Quobyte as an advanced, distributed, software defined storage system to deliver elastic and stable Spark performance in a container environment.

comments powered by Disqus