Deep learning on Apache Spark at CERNs Large Hadron Collider with Analytics Zoo
Sajan Govindan outlines CERNs research on deep learning in high energy physics experiments as an alternative to customized rule-based methods with an example of topology classification to improve real-time event selection at the Large Hadron Collider. CERN uses deep learning pipelines on Apache Spark using BigDL and Analytics Zoo open source software on Intel Xeon-based clusters.
Talk Title | Deep learning on Apache Spark at CERNs Large Hadron Collider with Analytics Zoo |
Speakers | Sajan Govindan (Intel) |
Conference | Strata Data Conference |
Conf Tag | Make Data Work |
Location | New York, New York |
Date | September 24-26, 2019 |
URL | Talk Page |
Slides | Talk Slides |
Video | |
Sajan Govindan dives into how CERN applied end-to-end deep learning and analytics pipelines on Apache Spark at scale for high energy physics using BigDL and Analytics Zoo open source software running on Intel Xeon-based distributed clusters. Sajan outlines technical details and development insights with an example of topology classification to improve real-time event selection at the Large Hadron Collider (LHC). The classifier demonstrated very good performance figures for efficiency while also reducing the false-positive rate compared to existing methods. It could be used as a filter to improve the online event selection infrastructure of the LHC experiments, where it could benefit from a more flexible and inclusive selection strategy while reducing the amount of downstream resources wasted in processing false positives. This is part of CERN’s research on applying deep learning and analytics using open source and industry-standard technologies as an alternative to the existing customized rule-based methods. Sajan explores how CERN could quickly build and implement distributed deep learning solutions and data pipelines at scale on Apache Spark using Analytics Zoo and BigDL, which are open source frameworks unifying analytics and AI on Spark with easy-to-use APIs and development interfaces seamlessly integrated with big data platforms.