December 5, 2019

299 words 2 mins read

The ultimate data scientist's playground: Building a multipetabyte analytic infrastructure for cyber defense

The ultimate data scientist's playground: Building a multipetabyte analytic infrastructure for cyber defense

Lee Blum offers an overview of Verint's large-scale cyber-defense system built to serve its data scientists with versatile analytic operations on petabytes of data and trillions of records, covering the company's extremely challenging use case, decision considerations, major design challenges, tips and tricks, and the systems overall results.

Talk Title The ultimate data scientist's playground: Building a multipetabyte analytic infrastructure for cyber defense
Speakers Lee Blum (Verint Systems)
Conference Strata Data Conference
Conf Tag Making Data Work
Location London, United Kingdom
Date May 22-24, 2018
URL Talk Page
Slides Talk Slides
Video

Modern large-scale cyber-defense systems are essentially based on data science and big data. However, addressing every aspect of data scientists’ versatile needs is not a trivial task. Cyber evidence and network forensics quickly scale to multipetabyte repositories constructed of trillions of tiny shreds of information. Moreover, in perhaps the most salient example of imbalanced data, malicious evidence accounts for less than one case in a million. Despite these complex entry barriers, an analytics infrastructure is required to demonstrate interactive response times for user queries, along with efficient batch operations. All these aspects must be achieved using an extremely low footprint, suitable for an on-premises solution. Lee Blum offers an overview of Verint’s large-scale cyber-defense system built to serve its data scientists with versatile analytic operations on petabytes of data and trillions of records, covering the company’s extremely challenging use case, decision considerations, major design challenges, tips and tricks, and the system’s overall results. The system’s big data pipeline is based on Apache Spark and the Hadoop ecosystem. An important factor when creating the cyber-defense system, was to enable Verint’s data scientists to feel at home when developing algorithms, which the company achieved by incorporating a wide range of use cases and implementing methods familiar to data scientists.

comments powered by Disqus