Paint the landscape and secure your data center with Apache Spot
Cesar Berho and Alan Ross offer an overview of open source project Apache Spot (incubating), which delivers next-generation cybersecurity analytics architecture through unsupervised learning using machine-learning techniques at cloud scale for anomaly detection.
Talk Title | Paint the landscape and secure your data center with Apache Spot |
Speakers | Cesar Berho Gallardo (Intel), Alan Ross (Intel) |
Conference | Strata + Hadoop World |
Conf Tag | Big Data Expo |
Location | San Jose, California |
Date | March 14-16, 2017 |
URL | Talk Page |
Slides | Talk Slides |
Video | |
Over the last few years, the traditional data center has been in a state of constant evolution. The arrival of cloud services and XaaS has introduced a new paradigm on the computing age, as well as on visibility and controls on this space, as it becomes an extension of the business network. In this new world, security is of the utmost importance. Existing threat tools can help, but it’s very expensive to analyze data at such a large scale and get actionable insights. Cybersecurity demands scale, and big data analytics and machine learning are the current top choices for success. A community-based approach to information security is needed. Cesar Berho and Alan Ross offer an overview of open source project Apache Spot (incubating), which delivers next-generation cybersecurity analytics architecture through unsupervised learning using machine-learning techniques at cloud scale for anomaly detection. Apache Spot represents a great place for interested individuals to contribute to and help define an open data model that provides a standard format for enriched event data that makes it easier to integrate cross-application data to gain complete enterprise visibility and develop net new analytic functionality. Open data models help organizations quickly share new analytics with one another as new threats are discovered, and with Hadoop, organizations are able to run these analytics against comprehensive historic datasets, helping them identify past threats that have slipped through the cracks, giving security professionals the ability to collaborate like cybercriminals do. Apache Spot’s approach involves several key processes to facilitate collection, storage, processing, and presentation of telemetry sources. As of today, current contributions are oriented to network use cases like network flows (nfcapd), DNS (PCAP), and proxies, and Apache Spot’s solutions are founded on a parallel ingest framework using Kafka, open source decoders that load data in Hadoop with Spark Streaming, machine learning that helps to filter billions of events to a few thousands, finding those outliers that can represent the needle on the haystack using unsupervised learning, and operational analytics. Community contribution is open and has a huge potential for the creation of enhanced and additional algorithms that can pick up broader event data types, on the endpoint or based on identity; inhance correlation for incident response; enter into predictive research and be able to observe at large scale potential threats in the near term; root cause analysis, which is especially useful on forensics and threat remediation; and a wider scope of analysis going beyond the traditional network architecture—observing things on SDN, security controllers, microservices, and making known the things that represent a black box today.