Building containerized Spark on a solid foundation with Quobyte and Kubernetes
Multiple challenges arise if distributed applications are provisioned in a containerized environment. Daniel Burer and Sascha Askani share a solution for distributed storage in cloud-native environments using Spark on Kubernetes.
Talk Title | Building containerized Spark on a solid foundation with Quobyte and Kubernetes |
Speakers | Daniel Bäurer (inovex GmbH), Sascha Askani (inovex GmbH) |
Conference | Strata Data Conference |
Conf Tag | Making Data Work |
Location | London, United Kingdom |
Date | May 23-25, 2017 |
URL | Talk Page |
Slides | Talk Slides |
Video | |
There are many challenges when deploying distributed applications on containers. One of the biggest is the lack of a stable and performant distributed filesystem. HDFS works very well with legacy Hadoop installations on commodity hardware in classic IT environments since it is very cheap to store a large amount of data on your compute nodes (data locality), but cloud-native environments do not allow HDFS to play out its advantages. Data locality on compute nodes, for example, stands contrary to the idea behind containers or cloud infrastructures. For this reason, many cloud-first implementations fall back to object stores like Amazon S3, Google Cloud Storage, or OpenStack Swift for persistence. Those solutions however lack many features of a real filesystem and suffer from low performance due to overhead. Daniel Bäurer and Sascha Askani share a solution using Spark on Kubernetes with Quobyte as an advanced, distributed, software defined storage system to deliver elastic and stable Spark performance in a container environment.