November 15, 2019

212 words 1 min read

Why is my Hadoop job slow?

Why is my Hadoop job slow?

Hadoop is used to run large-scale jobs over hundreds of machines. Considering the complexity of Hadoop jobs, it's no wonder that Hadoop jobs running slower than expected remains a perennial source of grief for developers. Bikas Saha draws on his experience debugging and analyzing Hadoop jobs to describe the approaches and tools that can solve this difficult problem.

Talk Title Why is my Hadoop job slow?
Speakers Bikas Saha (Hortonworks Inc)
Conference Strata + Hadoop World
Conf Tag Making Data Work
Location London, United Kingdom
Date June 1-3, 2016
URL Talk Page
Slides Talk Slides
Video

Hadoop is used to run large-scale jobs that are subdivided into many tasks that are executed over multiple machines. There are complex dependencies between these tasks, and at scale, there can be thousands of tasks running over thousands of machines, which makes it difficult to make sense of their performance. Add to that pipelines that logically run a business workflow as another level of complexity, and it’s no wonder that Hadoop jobs running slower than expected remains a perennial source of grief for developers. Bikas Saha draws on his experience debugging and analyzing Hadoop jobs to describe some methodical approaches and present new tracing and tooling ideas that can help semi-automate parts of this difficult problem.

comments powered by Disqus