December 15, 2019

175 words 1 min read

Breaking Spark: The top five mistakes to avoid when using Apache Spark in production

Breaking Spark: The top five mistakes to avoid when using Apache Spark in production

Drawing on his experiences across 150+ production deployments, Neelesh Srinivas Salian focuses on five common issues observed in a cluster environment setup with Apache Spark (Core, Streaming, and SQL) to help you improve the usability and supportability of Apache Spark and avoid such issues in future deployments.


Talk Title	Breaking Spark: The top five mistakes to avoid when using Apache Spark in production
Speakers
Conference	Strata + Hadoop World
Conf Tag	Make Data Work
Location	New York, New York
Date	September 27-29, 2016
URL	Talk Page
Slides	Talk Slides
Video

Apache Spark has been growing in deployments for the past two years. The increasing amount of data being analyzed and processed through the framework is massive, and it continues to push the boundaries of the engine. Drawing on his experiences across 150+ production deployments, Neelesh Srinivas Salian focuses on five common issues observed in a cluster environment setup with Apache Spark (Core, Streaming, and SQL) to help you improve the usability and supportability of Apache Spark and avoid such issues in future deployments. Topics include:

framework streaming apache sql spark cluster

comments powered by Disqus

Choice Hotels's journey to better understand its customers through self-service analytics

Choice Hotels's journey to better understand its customers through self-service analytics

December 14, 2019

Narasimhan Sampath and Avinash Ramineni share how Choice Hotels International used Spark Streaming, Kafka, Spark, and Spark SQL to create an advanced analytics platform that enables business users to be self-reliant by accessing the data they need from a variety of sources to generate customer insights and property dashboards and enable data-driven decisions with minimal IT engagement.

Hadoop application architectures: Architecting a next-generation data platform for real-time ETL, data analytics, and data warehousing

Hadoop application architectures: Architecting a next-generation data platform for real-time ETL, data analytics, and data warehousing

December 12, 2019

Jonathan Seidman, Gwen Shapira, Mark Grover, and Ted Malaska demonstrate how to architect a modern, real-time big data platform and explain how to leverage components like Kafka, Impala, Kudu, Spark Streaming, and Spark SQL with Hadoop to enable new forms of data processing and analytics such as real-time ETL, change data capture, and machine learning.

Stream analytics in the enterprise: A look at Intels internal IoT implementation

Stream analytics in the enterprise: A look at Intels internal IoT implementation

December 10, 2019

Moty Fania shares Intels IT experience implementing an on-premises IoT platform for internal use cases. The platform was designed as a multitenant platform with built-in analytical capabilities and based on open source big data technologies and containers. Moty highlights the lessons learned from this journey with a thorough review of the platforms architecture.

Breaking Spark: Top five mistakes to avoid when using Apache Spark in production

Breaking Spark: Top five mistakes to avoid when using Apache Spark in production

November 20, 2019

Spark has been growing in deployments for the past year. The increasing amount of data being analyzed and processed through the framework is massive and continues to push the boundaries of the engine. Drawing on his experiences across 150+ production deployments, Neelesh Srinivas Salian explores common issues observed in a cluster environment setup with Apache Spark.

Stream analytics in the enterprise: A look at Intels internal IoT implementation

Stream analytics in the enterprise: A look at Intels internal IoT implementation

November 17, 2019

Moty Fania shares Intels IT experience implementing an on-premises IoT platform for internal use cases. The platform was based on open source big data technologies and containers and was designed as a multitenant platform with built-in analytical capabilities. Moty highlights the key lessons learned from this journey and offers a thorough review of the platforms architecture.

Breaking Spark: Top 5 mistakes to avoid when using Apache Spark in production

Breaking Spark: Top 5 mistakes to avoid when using Apache Spark in production

October 27, 2019

Spark has been growing in deployments for the past year. Neelesh Srinivas Salian explores common issues observed in a cluster environment setup with Apache Spark and offers guidelines to help setup a real-world environment when planning an Apache Spark deployment in a cluster. Attendees can use these observations to improve the usability and supportability of Apache Spark in their projects.