High-performance enterprise data processing with Spark
Vickye Jain and Raghav Sharma explain how they built a very high-performance data processing platform powered by Spark that balances the considerations of extreme performance, speed of development, and cost of maintenance.
Talk Title | High-performance enterprise data processing with Spark |
Speakers | Vickye Jain (ZS Associates), Raghav Sharma (ZS Associates) |
Conference | Strata + Hadoop World |
Conf Tag | Make Data Work |
Location | Singapore |
Date | December 6-8, 2016 |
URL | Talk Page |
Slides | Talk Slides |
Video | |
Enterprises are getting increasingly comfortable with moving traditional workloads to Spark. However, despite its popularity, Spark remains an esoteric technology within enterprises, and many for whom technology is not their core competence, are wary of building internally managed applications on Spark, in part owing to the lack of a steady talent pool and a fear of budget overruns. As such, there is still a constant struggle to balance the ability to support advanced technology platforms within enterprises with matrix organizations, complex funding channels, and business demands. Vickye Jain and Raghav Sharma explain how they built a very high-performance data processing platform powered by Spark that balances the considerations of extreme performance, speed of development, and cost of maintenance. Vickye and Raghav had to negotiate conflicting objectives such as: Vickye and Raghav also offer an overview of the architecture itself, which consists of several elastic clusters, external orchestrators providing full visibility into jobs, a combination of job servers and traditional Spark applications, and deep integration with technical experts with domain experts for rapid development.