November 18, 2019

206 words 1 min read

Introduction to generalized low-rank models and missing values

Introduction to generalized low-rank models and missing values

The generalized low-rank model is a new machine-learning approach for reconstructing missing values and identifying important features in heterogeneous data. Through a series of examples, Jo-fai Chow demonstrates how to fit low-rank models in a parallelized framework and how to use these models to make better predictions.


Talk Title	Introduction to generalized low-rank models and missing values
Speakers	Jo-fai Chow (H2O.ai)
Conference	Strata + Hadoop World
Conf Tag	Making Data Work
Location	London, United Kingdom
Date	June 1-3, 2016
URL	Talk Page
Slides	Talk Slides
Video

Across business and research, analysts seek to understand large collections of data with numeric, Boolean, and categorical values. Many entries in the table may be noisy or even missing altogether. Low-rank models facilitate understanding of tabular data by producing a condensed vector representation for every row and column in the dataset. These representations can then be compared, clustered, plotted, and used in subsequent analysis. Jo-fai Chow describes offers an overview of low-rank models and demonstrates how to build them in H2O, an open source distributed machine-learning platform. Through examples, Jo-fai explains how to fit low-rank models to numeric and categorical datasets with missing values and how to use these models to identify important features and make better predictions. Topics include:

prediction dataset introduction open source cluster

comments powered by Disqus

HopsWorks: Multitenant Hadoop as a service

HopsWorks: Multitenant Hadoop as a service

November 18, 2019

Currently, multitenancy in Hadoop is limited to organizations running separate Hadoop clusters, and the secure sharing of resources is achieved using virtualization or containers. Jim Dowling describes how HopsWorks enables organizations to securely share a single Hadoop cluster using projects and a new metadata layer that enables protection domains while still allowing data sharing.

IoT in the enterprise: A look at Intel (IoT) Inside

IoT in the enterprise: A look at Intel (IoT) Inside

October 23, 2019

Moty Fania shares Intels IT experience implementing an on-premises big data IoT platform for internal use cases. This unique platform was built on top of several open source technologies and enables highly scalable stream analytics with a stack of algorithms such as multisensor change detection, anomaly detection, and more.

TensorFlow: Large-scale analytics and distributed machine learning with TensorFlow, BigQuery, and Dataflow (Apache Beam)

TensorFlow: Large-scale analytics and distributed machine learning with TensorFlow, BigQuery, and Dataflow (Apache Beam)

October 20, 2019

Kazunori Sato and Amy Unruh explore how you can use TensorFlow to drive large-scale distributed machine learning against your analytic data sitting in Google BigQuery, with data preprocessing driven by Dataflow (now Apache Beam). Kazunori and Amy dive into practical examples of how these technologies can work together to enable a powerful workflow for distributed machine learning.

Visualizing millions of datapoints with GPUs in the client and server

Visualizing millions of datapoints with GPUs in the client and server

October 13, 2019

The ability to visualize millions of data points opens up a world of applications. But its hard to quickly render that much data, let alone compute anything with it. Thibaud Hottelier shares the GPU technologya hybrid client/server GPU engine using Node-OpenCL and a new library, CL.jsthat Graphistry used to break the million datapoint barrier for fast, interactive visualizations.

Stream analytics in the enterprise: A look at Intels internal IoT implementation

Stream analytics in the enterprise: A look at Intels internal IoT implementation

November 17, 2019

Moty Fania shares Intels IT experience implementing an on-premises IoT platform for internal use cases. The platform was based on open source big data technologies and containers and was designed as a multitenant platform with built-in analytical capabilities. Moty highlights the key lessons learned from this journey and offers a thorough review of the platforms architecture.

TensorFlow: Machine learning for everyone

TensorFlow: Machine learning for everyone

November 17, 2019

TensorFlow is an open source software library for numerical computation with a focus on machine learning. Its flexible architecture makes it great for research and production deployment. Sherry Moore offers a high-level introduction to TensorFlow and explains how to use it to train machine-learning models to make your next application smarter.