November 25, 2019

197 words 1 min read

Distinguish pop music from heavy metal using Apache Spark MLlib

Distinguish pop music from heavy metal using Apache Spark MLlib

Taras Matyashovsky explains how to use Apache Spark MLlib to build a supervised learning NLP pipeline to distinguish pop music from heavy metaland have fun in the process.


Talk Title	Distinguish pop music from heavy metal using Apache Spark MLlib
Speakers	Taras Matyashovsky (Lohika)
Conference	O’Reilly Open Source Convention
Conf Tag	Making Open Work
Location	Austin, Texas
Date	May 8-11, 2017
URL	Talk Page
Slides	Talk Slides
Video

Machine learning may be overhyped nowadays, but there is still a strong belief that this area is exclusively for data scientists with a deep mathematical background who leverage the Python (scikit-learn, Theano, TensorFlow, etc.) or R ecosystems and use specific tools like R Studio, Matlab, or Octave. Obviously, there is some truth to this statement, but Java engineers can also take the best of the machine-learning world from an applied perspective by using our native language and familiar frameworks like Apache Spark. Taras Matyashovsky explains how to use Apache Spark MLlib to build a supervised learning NLP pipeline to distinguish pop music from heavy metal—and have fun in the process. Along the way, Taras offers an overview of the simplest machine-learning tasks and algorithms, like regression, classification, and clustering.

java apache math framework algorithm spark ecosystem tensorflow scikit-learn ovs supervised machine learning python pipeline cluster nlp

comments powered by Disqus

Paint the landscape and secure your data center with Apache Spot

Paint the landscape and secure your data center with Apache Spot

November 4, 2019

Cesar Berho and Alan Ross offer an overview of open source project Apache Spot (incubating), which delivers next-generation cybersecurity analytics architecture through unsupervised learning using machine-learning techniques at cloud scale for anomaly detection.

The future of column-oriented data processing with Arrow and Parquet

The future of column-oriented data processing with Arrow and Parquet

November 1, 2019

In pursuit of speed, big data is evolving toward columnar execution. The solid foundation laid by Arrow and Parquet for a shared columnar representation across the ecosystem promises a great future. Julien Le Dem and Jacques Nadeau discuss the future of columnar and the hardware trends it takes advantage of, such as RDMA, SSDs, and nonvolatile memory.

Machines and the magic of fast learning (sponsored by MemSQL)

Machines and the magic of fast learning (sponsored by MemSQL)

November 4, 2019

Eric Frenkiel explains how to use real-time data as a vehicle for operationalizing machine-learning models by leveraging MemSQL, exploring advanced tools, including TensorFlow, Apache Spark, and Apache Kafka, and compelling use cases demonstrating the power of machine learning to effect positive change.

Building Distributed TensorFlow Using Both GPU and CPU on Kubernetes [I]

Building Distributed TensorFlow Using Both GPU and CPU on Kubernetes [I]

November 24, 2019

Big Data and Machine Learning have become extremely hot topics in recent years. Google has announced its AI-centric strategy and released the deep learning toolkit TensorFlow. TensorFlow soon became t …

Global empire: Building for fun and profit

Global empire: Building for fun and profit

November 24, 2019

To establish a global user base, a product needs to support a variety of locales. The challenge with supporting multiple locales is the maintenance and generation of localized strings. Michelle Casbon explains how open source tools like Scala, Apache Spark, Apache Kafka, and Apache PredictionIO (incubating) provide structure for a scalable localization platform with machine learning at its core.

Open source AI at AWS and Apache MXNet

Open source AI at AWS and Apache MXNet

November 22, 2019

A wide variety of open source frameworks and tools support artificial intelligence and deep learning. Adrian Cockcroft explains how AWS has packaged a number of themincluding deep learning frameworks such as Caffe, CNTK, Keras, MXNet, TensorFlow, Theano, and Torch and supporting tools like Jupyter and Anacondainto an Amazon Machine Image with optimized GPU support.