October 31, 2019

328 words 2 mins read

Uber's data science workbench

Uber's data science workbench

Peng Du and Randy Wei offer an overview of Ubers data science workbench, which provides a central platform for data scientists to perform interactive data analysis through notebooks, share and collaborate on scripts, and publish results to dashboards and is seamlessly integrated with other Uber services, providing convenient features such as task scheduling, model publishing, and job monitoring.

Talk Title Uber's data science workbench
Speakers Peng Du (Uber Inc.), Randy Wei (Uber Inc.)
Conference Strata + Hadoop World
Conf Tag Big Data Expo
Location San Jose, California
Date March 14-16, 2017
URL Talk Page
Slides Talk Slides
Video

Peng Du and Randy Wei offer an overview of Uber’s data science workbench, which provides a central platform for data scientists to perform interactive data analysis through notebooks like Jupyter and RStudio, share and collaborate on scripts, and publish results to dashboards and is seamlessly integrated with other Uber services, providing convenient features such as task scheduling, model publishing, and job monitoring. Uber’s data science workbench provides clients with a scalable compute environment through dedicated Docker containers spawned by requests for notebook instances and a YARN/Mesos managed cluster for compute engines such as Spark, Hive, and Presto. Socialization features are supported in the workbench where clients can share, comment, and collaborate on notebook scripts with appropriate access control. All files, including scripts and results, are maintained by a version control system so that people can track progress and compare results. In order to improve the productivity of data scientists, the workbench is also integrated with multiple services in Uber. A matured script can be scheduled as a periodical task in Uber’s job scheduling service, and people can publish their results through dashboard services like Shiny and models through Uber’s machine-learning platform. Last but not least, for complicated tasks that involve long-time running jobs in Spark, Hive, or Presto, the workbench will register the jobs in Uber’s monitoring service so that people can check the progress and debug information from them.

comments powered by Disqus