January 2, 2020

244 words 2 mins read

Apache Spark ML and MLlib tuning and optimization: A case study on boosting the performance of ALS by 60x

Apache Spark ML and MLlib tuning and optimization: A case study on boosting the performance of ALS by 60x

Apache Spark ML and MLlib are hugely popular in the big data ecosystem, and Intel has been deeply involved in Spark from a very early stage. Peng Meng outlines the methodology behind Intel's work on Spark ML and MLlib optimization and shares a case study on boosting the performance of Spark MLlib ALS by 60x in JD.coms production environment.

Talk Title Apache Spark ML and MLlib tuning and optimization: A case study on boosting the performance of ALS by 60x
Speakers Peng Meng (Intel)
Conference Strata + Hadoop World
Conf Tag Make Data Work
Location Singapore
Date December 6-8, 2016
URL Talk Page
Slides Talk Slides
Video

Apache Spark ML and MLlib are hugely popular in the big data ecosystem and have evolved from standard ML libraries to powerful components that support complex workflows and production requirements. Intel has been deeply involved in Spark from a very early stage, working with the community in feature development, bug fixing, and performance optimization. Peng Meng outlines the methodology behind Intel’s work on Spark ML and MLlib optimization and shares a case study on boosting the performance of Spark MLlib alternating least squares (ALS) by 60x in JD.com’s production environment. The methods include rewriting the code of recommendForAll, CartesianRDD compute optimization, choosing between f2jBLAS and NativeBLAS, the best settings for the cluster, and ALS parameters. This solution not only largely reduced the computation time on JD and VipShop production environment. It was also merged into Apache Spark.

comments powered by Disqus