February 4, 2020

203 words 1 min read

"CASE": A Mesos Scheduler for Distributed Machine Learning

"CASE": A Mesos Scheduler for Distributed Machine Learning

Many machine learning frameworks support distributed training of models. Distributed training is becoming increasingly important as organizations train on larger datasets with increased importance on …

Talk Title "CASE": A Mesos Scheduler for Distributed Machine Learning
Speakers Steven Bairos-Novak (Software Engineer, Pinterest), Karthik Anantha Padmanabhan (Software Engineer)
Conference Open Source Summit North America
Conf Tag
Location Vancouver, BC, Canada
Date Aug 27-31, 2018
URL Talk Page
Slides Talk Slides
Video

Many machine learning frameworks support distributed training of models. Distributed training is becoming increasingly important as organizations train on larger datasets with increased importance on reducing the overall training time. Every ML framework comes with its own specification on how to do distributed machine learning. Typically all of them have their own notion of workers and have a mechanism on how these workers communicate to update and share their learnt parameters.The lifecycle of these workers needs to managed differently for different ML frameworks and typically requires the use of an external cluster manager to schedule workers on machines and manage their lifecycle. In this talk, Karthik will talk about “Case”, a Mesos batch scheduler that supports launching and managing the lifecycle of workers across multiple ML frameworks ( Tensorflow, LightGBM, XGBoost etc ).

comments powered by Disqus