February 8, 2020

396 words 2 mins read

Working with time series: Denoising and imputation frameworks to improve data density

Working with time series: Denoising and imputation frameworks to improve data density

The application of smoothing and imputation strategies is common practice in predictive modeling and time series analysis. With a technique-agnostic approach, Anjali Samani provides qualitative and quantitative frameworks that address questions related to smoothing and imputation of missing values to improve data density.

Talk Title Working with time series: Denoising and imputation frameworks to improve data density
Speakers Anjali Samani (CircleUp)
Conference Strata Data Conference
Conf Tag Make Data Work
Location New York, New York
Date September 24-26, 2019
URL Talk Page
Slides Talk Slides
Video

Increasingly, organizations are looking beyond conventional data provided by data aggregators and vendors in their industry. But alternative data, because of the way it’s generated and collected, is typically noisy and often ephemeral. A model’s ability to learn and correctly predict future outcomes is greatly influenced by the underlying data. Clean, complete data can make the difference between deriving correct and incorrect conclusions. Incomplete data can restrict its application to only a small set of techniques. And for alternative data sources, missed data is almost impossible to recover. Anjali Samani explains two simple frameworks for evaluating a dataset’s candidacy for smoothing and quantitatively determining the optimal imputation strategy and the number of consecutive missing values that can be imputed without material degradation in signal quality. To extract meaningful signals from alternative data, it’s necessary to apply denoising and imputation to generate clean and complete time series. There are numerous ways to smooth a noisy data series and impute missing values, each with relative strengths and weaknesses. Smoothing removes noise from the data and allows patterns and trends to be identified more easily. It can, however, make a series appear less volatile than it is and may mask the very patterns you’re seeking to identify. So you have to know when you should and shouldn’t smooth a series, and if it is smoothed, what type of smoothing you should apply. Similarly, missing observations in time series can be imputed in many ways. These are covered in detail in both academic and practitioner literature. What caused the missing values in the first place and how the data is going to be used in downstream applications can often inform the most appropriate strategy for imputation. However, when there are multiple options to choose from, you have to objectively choose between different strategies and identify how many consecutive missing values can be safely imputed.

comments powered by Disqus