February 7, 2020

487 words 3 mins read

Your easy move to serverless computing and radically simplified data processing

Your easy move to serverless computing and radically simplified data processing

Most analytic flows can benefit from serverless, starting with simple cases to and moving to complex data preparations for AI frameworks like TensorFlow. To address the challenge of how to easily integrate serverless without major disruptions to your system, Gil Vernik explores the push to the cloud experience, which dramatically simplifies serverless for big data processing frameworks.

Talk Title Your easy move to serverless computing and radically simplified data processing
Speakers Gil Vernik (IBM)
Conference Strata Data Conference
Conf Tag Make Data Work
Location New York, New York
Date September 24-26, 2019
URL Talk Page
Slides Talk Slides

Suppose you wrote Python code for Monte Carlo simulations to analyze financial data. The general process involves writing the code and running a simulation over small set of data to test it. Assuming this all goes smoothly, you now must run the same code at a massive scale, with parallelism, on terabytes of data, doing millions of Monte Carlo simulations. Clearly you’d prefer not to need to learn the intricacies of setting up virtual machines, suffer long setup times for the virtual machines, nor become an expert in scaling up Python code. This is exactly where serverless computing could come to the rescue. With serverless computing, you don’t need to set up the computing environment and only pay for the actual amount of resources your application consumes rather than prepurchased units of capacity. Here you’ll learn how to easily gain these benefits. Gil Vernik takes a deep dive into the challenge of how serverless computing can be easily used for a broad range of scenarios, like high-performance computing (HPC), Monte Carlo simulations, and data preprocessing for AI. You’ll focus on how to connect existing code and frameworks to serverless without the painful process of starting from scratch and or learning new skills. To achieve this, you’re based on the open source PyWren framework that introduces serverless computing with minimal effort, and its new fusion with serverless computing brings automated scalability and the use of existing frameworks into the picture. You can simply write a Python function and provide an input pointing to the dataset in a storage bucket. Then PyWren does the magic by automatically scaling and executing the user function as a serverless action at massive scale. Gil demonstrates how this capability allowed IBM to run broad range of scenarios over serverless, including Monte Carlo simulations to predict future stock prices and hyperparameter optimizations for ML models. IBM managed to complete the entire Monte Carlo simulation for stock price prediction in about 90 seconds with 1,000 concurrent invocations, compared to 247 minutes with almost 100% CPU utilization running the same flow over a laptop with 4 CPU cores. He’ll also show you how to bond TensorFlow and serverless for the data-preparation phases. Existing TensorFlow code can be easily adapted and benefit serverless with only minimal code modifications and without users having to learn serverless architectures and deployments.

comments powered by Disqus