October 18, 2019

265 words 2 mins read

We enhance privilege with supervised machine learning

We enhance privilege with supervised machine learning

Machines are not objective, and big data is not fair. Michael Williams uses sentiment analysis to show that supervised machine learning has the potential to amplify the voices of the most privileged people in society, violate the spirit and letter of civil rights law, and make your product suck.


Talk Title	We enhance privilege with supervised machine learning
Speakers	Mike Lee Williams (Cloudera Fast Forward Labs)
Conference	Strata + Hadoop World
Conf Tag	Big Data Expo
Location	San Jose, California
Date	March 29-31, 2016
URL	Talk Page
Slides	Talk Slides
Video

Michael Williams uses sentiment analysis to show that supervised machine learning has the potential to amplify the voices of the most privileged people in society. A sentiment analysis algorithm is considered table stakes for any serious text-analytics platform in social media, finance, or security. As an example of supervised machine learning, Michael demonstrates how these systems are trained and illustrates that they have the unavoidable property of being better at spotting unsubtle expressions of extreme emotion. Such crude expressions are used by a particularly privileged group of authors: men. As a result, brands that depend on sentiment analysis to “learn what people think” inevitably pay more attention to men. But the problem doesn’t stop with sentiment analysis: Michael explains how at every step of any model-building process, we make choices that can introduce bias, enhance privilege, break the law, or simply make your product suck. Michael reviews these pitfalls, talks about how to recognize them in your own work, and touches on some new academic work that aims to measure and mitigate these harms.

finance security algorithm analytics supervised sentiment analysis machine learning voice

comments powered by Disqus

What's next for BDAS (the Berkeley Data Analytics Stack)?

What's next for BDAS (the Berkeley Data Analytics Stack)?

October 18, 2019

Michael Franklin offers an overview of the Berkeley Data Analytics Stack, outlines the current directions it's taking, and settles once and for all how BDAS should be pronounced.

Repeatable processes for building secure containers

Repeatable processes for building secure containers

October 14, 2019

Building Docker images is easy; thats why there are over 45,000 public images on Docker Hub today (albeit only 100 of them "official" images). Creating reproducible, secure images from source that are easily maintained and updated takes a bit more planning and automation. Ryan Jarvinen illustrates what it takes to create a successful (and secure) build strategy.

The Seif project

The Seif project

October 14, 2019

The Web has grown to become a hugely important medium, but it has also become horrendously complex, which extends development schedules and promotes bug formation. Douglas Crockford introduces Seif, an open source project started at PayPal with the goal of transitioning the Web into an application delivery system that will be safer, easier to use, and easier to develop for.

Designing with data

Designing with data

October 12, 2019

The data shows that we can't escape data. Whether designer, developer, or researcher, this workshop is for everyone who wants to learn how to get more out of the ever-growing mountain of data to create a better user experience. Pamela Pavliscak explains how to combine data from analytics, social media, public datasets, diaries, usability tests, and more to inform design.

I hear voices: Explorations of multidevice experiences with conversational assistants

I hear voices: Explorations of multidevice experiences with conversational assistants

October 11, 2019

Multiscreen experiences are powerful; we now divide our time between different devices based on context. At the same time, conversational assistants have evolved to be quite usable. But were just beginning to see how one assistant might work across an ecosystem of devices. Karen Kaushansky explores the future of designing with voice across multiple devices.

Measuring hard-to-measure things

Measuring hard-to-measure things

October 10, 2019

GitHub has an abundance of quantitative data about what people are doing. Over the past two years, it has built a practice of qualitative research dedicated to uncovering the why. Qualitative research surfaces blind spots with product and customers and has changed the way GitHub ships features. Using three examples, Chrissie Brodigan shares how GitHub rolls features out as controlled experiments.