December 27, 2019

301 words 2 mins read

Applications of mixed effects random forests

Applications of mixed effects random forests

Clustered data is all around us. The best way to attack it? Mixed effect models. Sourav Dey explains how the mixed effects random forests (MERF) model and Python package marries the world of classical mixed effect modeling with modern machine learning algorithms and shows how it can be extended to be used with other advanced modeling techniques like gradient boosting machines and deep learning.

Talk Title Applications of mixed effects random forests
Speakers Sourav Dey (Manifold)
Conference Strata Data Conference
Conf Tag Big Data Expo
Location San Francisco, California
Date March 26-28, 2019
URL Talk Page
Slides Talk Slides
Video

Clustered data is all around us. The most common example is longitudinal clustering, where each individual instance of a phenomena you wish to model has multiple associated measurements (e.g., modeling math test scores as a function of sleep factors when you have multiple measurements per student). Another common example is clustering due to a categorical variable (e.g., clusters representing the specific math teacher of a group of students). Clustering can also be hierarchical (e.g., a student cluster contained within a teacher cluster, which is itself contained within a school cluster). When modeling clustered data, you must account for any idiosyncrasies and nonnegligible random effects by cluster. The best way to attack this kind of data? Mixed effects models. Inspired by the models we have been building for clients, Manifold has developed mixed effects random forests (MERF)—an open source implementation package in Python. Sourav Dey explains how the MERF model marries the world of classical mixed effect modeling with modern machine learning algorithms and shows how it can be extended to be used with other advanced modeling techniques like gradient boosting machines and deep learning. He also walks you through example use cases and demonstrates MERF performance on synthetic and real data.

comments powered by Disqus