How Spark can fail or be confusing and what you can do about it
Just like any six-year-old, Apache Spark does not always do its job and can be hard to understand. Yin Huai looks at the top causes of job failures customers encountered in production and examines ways to mitigate such problems by modifying Spark. He also shares a methodology for improving resilience: a combination of monitoring and debugging techniques for users.
Talk Title | How Spark can fail or be confusing and what you can do about it |
Speakers | Yin Huai (Databricks) |
Conference | Strata + Hadoop World |
Conf Tag | Big Data Expo |
Location | San Jose, California |
Date | March 14-16, 2017 |
URL | Talk Page |
Slides | Talk Slides |
Video | |
Apache Spark has become one of the most popular open source projects in big data. But like any six-year-old, Spark does not always do its job correctly and can be hard to understand. Yin Huai looksat the top causes of job failures customers encountered in production, which include resource exhaustion and hitting internal limits within Spark. Yin shares examples of common failures to highlight recent improvements and possible future work. He also shares a methodology for improving resilience: a combination of monitoring and debugging techniques for users.