A novel solution for a data augmentation and bias problem in NLP using TensorFlow
February 24, 2020
Join KC Tung to discover a way to use TensorFlow to solve a natural language processing (NLP) model bias problem with data augmentation for an enterprise customer (one of the largest airlines in the world). KC leveraged hidden gems in tf.data and the new API to easily find a novel use for text generation and found it surprisingly improved his NLP model.
Anomaly detection using deep learning to measure the quality of large datasets
February 22, 2020
Any business, big or small, depends on analytics, whether the goal is revenue generation, churn reduction, or sales or marketing purposes. No matter the algorithm and the techniques used, the result depends on the accuracy and consistency of the data being processed. Sridhar Alla examines some techniques used to evaluate the quality of data and the means to detect the anomalies in the data.
Scalable anomaly detection with Spark and SOS
February 10, 2020
Jeroen Janssens dives into stochastic outlier section (SOS), an unsupervised algorithm for detecting anomalies in large, high-dimensional data. SOS has been implemented in Python, R, and, most recently, Spark. He illustrates the idea and intuition behind SOS, demonstrates the implementation of SOS on top of Spark, and applies SOS to a real-world use case.
Data science + design thinking: A perfect blend to achieve the best user experience
February 6, 2020
Design thinking is a methodology for creative problem-solving developed at the Stanford d.school. The methodology is used by world-class design firms like IDEO and many of the world's leading brands like Apple, Google, Samsung, and GE. Michael Radwin prepares a recipe for how to apply design thinking to the development of AI/ML products.
Analytics Zoo: Distributed TensorFlow and Keras on Apache Spark
December 27, 2019
Jason Dai, Yuhao Yang, Jennie Wang, and Guoqiong Song explain how to build and productionize deep learning applications for big data with Analytics Zooa unified analytics and AI platform that seamlessly unites Spark, TensorFlow, Keras, and BigDL programs into an integrated pipelineusing real-world use cases from JD.com, MLSListings, the World Bank, Baosight, and Midea/KUKA.
Masquerading malicious DNS traffic
December 22, 2019
Malicious DNS traffic patterns are inconsistent and typically thwart anomaly detection. David Rodriguez explains how Cisco uses Apache Spark and Stripes Bayesian inference software, Rainier, to fit the underlying time series distribution for millions of domains and outlines techniques to identify artificial traffic volumes related to spam, malvertising, and botnets (masquerading traffic).