December 14, 2019

268 words 2 mins read

Creating and evaluating a distance measure

Creating and evaluating a distance measure

Whether we're talking about spam emails, merging records, or investigating clusters, there are many times when having a measure of how alike things are makes them easier to work with (e.g., with unstructured data that isn't incorporated into your data models). Melissa Santos offers a practical approach to creating a distance metric and validating with business owners that it provides value.

Talk Title Creating and evaluating a distance measure
Speakers
Conference Strata + Hadoop World
Conf Tag Make Data Work
Location New York, New York
Date September 27-29, 2016
URL Talk Page
Slides Talk Slides
Video

Whether we’re talking about spam emails, merging records, or investigating clusters, there are many times when having a measure of how alike things are makes them easier to work with. You may have unstructured or vague data that isn’t incorporated into your data models (e.g., information from subject-matter experts who have a sense of whether something is good or bad, similar or different). Melissa Santos offers a practical approach to creating a distance metric and validating with business owners that it provides value—providing you with the tools to turn that expert information into numbers you can compare and use to quickly see structures in the data. Melissa walks you through setting expectations for a distance, creating distance metrics, iterating with experts to check expectations, validating the distance on a large chunk of the dataset, and then circling back to add more complexity and shares some real-world examples, such as distance from usual emails from a domain, quality scores for geographic data, and merging person records if they are sufficiently similar. Topics include:

comments powered by Disqus