Adapt to Unified and Pluggable Cluster Management Platform at LinkedIn
RAIN is a cluster resource management system developed at LinkedIn. It manages resources for tens of thousands of hosts per cluster in multiple datacenters including Azure to support scheduling both l …
Talk Title | Adapt to Unified and Pluggable Cluster Management Platform at LinkedIn |
Speakers | Abin Shahab (Staff Software Engineer, LinkedIn), Tengfei Mu (Engineering Manager, LinkedIn) |
Conference | KubeCon + CloudNativeCon |
Conf Tag | |
Location | Shanghai, China |
Date | Jun 23-26, 2019 |
URL | Talk Page |
Slides | Talk Slides |
Video | |
RAIN is a cluster resource management system developed at LinkedIn. It manages resources for tens of thousands of hosts per cluster in multiple datacenters including Azure to support scheduling both long running and batch jobs. It is integrated with existing LinkedIn cluster management ecosystem. The goal for our next generation cluster management system is to support heterogeneous compute workloads quickly to improve developer productivity and server utilizations. We have evaluated and decided to adopt K8s' declarative API and extensible architecture. The adoption process has quite a few challenges for integrating with existing ecosystem at LinkedIn scale. We first give an overview of LinkedIn cluster management ecosystem. Then we talk about our evaluation process and adoption challenges. We will then share lessons we learned during production and integration process.