February 21, 2020

842 words 4 mins read

Audience projection of target consumers over multiple domains: A NER and Bayesian approach

Audience projection of target consumers over multiple domains: A NER and Bayesian approach

AI-powered market research is performed by indirect approaches based on sparse and implicit consumer feedback (e.g., social network interactions, web browsing, or online purchases). These approaches are more scalable, authentic, and suitable for real-time consumer insights. Gianmario Spacagna proposes a novel algorithm of audience projection able to provide consumer insights over multiple domains.

Talk Title Audience projection of target consumers over multiple domains: A NER and Bayesian approach
Speakers Gianmario Spacagna (Helixa)
Conference O’Reilly Artificial Intelligence Conference
Conf Tag Put AI to Work
Location London, United Kingdom
Date October 15-17, 2019
URL Talk Page
Slides Talk Slides
Video

Traditional market research is generally conducted by questionnaires or other forms of explicit feedback, directly asked to an ad hoc panel that in aggregate is representative of a larger group of people. The goal is to generalize their habits, perceptions, and opinions on a given subject to understand the needs and interests of the greater consumer population. Unfortunately, those traditional approaches are often invasive, nonscalable, and biased. As such, these methodologies must be viewed as incomplete and only narrowly representative. Indirect approaches based on sparse and implicit consumer feedback (e.g., social network interactions, web browsing, or online purchases) are more scalable, authentic, and more suitable for real-time consumer insights. The rise of data availability, together with algorithm advancements and AI capabilities, will lead the next generation of market research methodologies. Although those sources of implicit consumer feedback provide relevant and detailed pictures of the population, they individually provide only a limited set of observable behaviors. Unlike custom surveys, implicit observations are incomplete and don’t provide enough evidence on negative signals: what consumers are not interested in. A segment of the population having a high volume of interaction with a given brand may have a high affinity with it, but nothing can be said about unobserved interactions with other entities. Techniques based on user-generated content (e.g., reviews or customer care complaints) could provide negative feedback but are strongly influenced by immediate emotional status and often are too personal to be generalizable. Each implicit feedback domain provides a detailed but very narrow view that may lead to incomplete and nonactionable insights. Gianmario Spacagna proposes the novel approach of audience projection by leveraging named entity recognition (NER) techniques to match related brands and Bayesian inference to transfer knowledge from the source domain. The challenge for the entity recognition algorithm, and in particular natural language processing techniques, is to measure the degree of similarity of two brands based only on extracted entities. Entity-based similarity, as opposed to text-based similarity, captures more realistic patterns and behaviors of the population. The entity-based similarity can be adapted to map the set of source brands to all destination brands and significantly improve the accuracy of the baseline method. The classifier probability functions are derived from a binomial distribution based on the assumption that a target always shows consistent market penetration distributions of the entities in common. That is, the percentage of consumers interested in a particular entity reached by the target is preserved in both source and destination domains. This way, we can estimate the probability of the user belonging to the target using the source distribution of market penetrations as model evidence and the source target size as prior probability. One of the greatest challenges in market research is the ability to merge different sources of consumers’ interests into an augmented view that connects all the dots across multiple domains. The task of audience projection is the ability to define a target audience as a subset of the population in a source domain and to project this target to a set of users into a destination dataset. The problem is modeled as a binary classification where the task is predicting for each user in the destination dataset their membership probability of belonging to the projected target. Merging multiple data sources is generally conducted by “fusing” users based on unique keys, such as personal identifiers. When dealing with anonymized datasets, the absence of those identifiers is solved by a fuzzy look-alike record linkage. That is, users of a central dataset are linked to the most similar user in all other datasets based on common similarities. Even though there are many algorithms that can optimally find the best matches between two or more sets of users, those data-centric architectures present many limitations in the case of heterogeneous datasets strongly differing in terms of size and density and when the number of sources to merge increases. Fusion algorithms at item level are often preferred to user-level linkages. In other words, even if the two datasets represent completely different types of observations, you can more easily identify matches of common entities (e.g., interacting with a brand’s social media page could be associated with purchasing the brand’s products). Based on this principle, cross-domain adaptation algorithms based on item similarities based on textual descriptions are not suitable for representing real consumer patterns. In content-based similarity, two competitors producing similar products are, by definition, very similar, but this does not necessarily mean they share the same consumer base.

comments powered by Disqus