Why is my Hadoop job slow?
Hadoop is used to run large-scale jobs over hundreds of machines. Considering the complexity of Hadoop jobs, it's no wonder that Hadoop jobs running slower than expected remains a perennial source of grief for developers. Bikas Saha draws on his experience debugging and analyzing Hadoop jobs to describe the approaches and tools that can solve this difficult problem.
Talk Title | Why is my Hadoop job slow? |
Speakers | Bikas Saha (Hortonworks Inc) |
Conference | Strata + Hadoop World |
Conf Tag | Making Data Work |
Location | London, United Kingdom |
Date | June 1-3, 2016 |
URL | Talk Page |
Slides | Talk Slides |
Video | |
Hadoop is used to run large-scale jobs that are subdivided into many tasks that are executed over multiple machines. There are complex dependencies between these tasks, and at scale, there can be thousands of tasks running over thousands of machines, which makes it difficult to make sense of their performance. Add to that pipelines that logically run a business workflow as another level of complexity, and it’s no wonder that Hadoop jobs running slower than expected remains a perennial source of grief for developers. Bikas Saha draws on his experience debugging and analyzing Hadoop jobs to describe some methodical approaches and present new tracing and tooling ideas that can help semi-automate parts of this difficult problem.