Open source decentralized data markets for training AI in areas of large shared risk

Paco Nathan examines decentralized data markets. With components based on blockchain technologiessmart contracts, token-curated registries, DApps, voting mechanisms, etc.decentralized data markets allow multiple parties to curate ML training datasets in ways that are transparent, auditable, and secure and allow equitable payouts that take social values into account.


Talk Title	Open source decentralized data markets for training AI in areas of large shared risk
Speakers	Paco Nathan (derwen.ai)
Conference	Artificial Intelligence Conference
Conf Tag	Put AI to Work
Location	San Francisco, California
Date	September 5-7, 2018
URL	Talk Page
Slides	Talk Slides
Video

As the risk and reward trade-offs grow for products based on AI, along with the pressures of compliance and accountability, at what point is it no longer acceptable for any one commercial entity to hold responsibility for so much shared risk? Can we incentivize corporations, government agencies, independent watchdog groups, and other relevant parts to combine their data in cases where there are large shared risks? ML models have become ubiquitous, embedded in products and services used throughout our daily lives. Generally, those models get deployed by large commercial interests, which train them on proprietary datasets. However, matters of ethics, privacy, safety, bias, and other concerns can have terrible impact on individuals. For example, Google develops large sets of training data from crucial sensors in self-driving cars. In an almost adversarial way, the regulators on multiple continents focus on the impact of failure cases related to those sensors and associated ML models. Edge cases in test datasets prove to be disproportionately valuable, and potentially the basis for economic incentives. Instead of entrusting each manufacturer to build “near perfect” training datasets while bearing large risks, we should incentivize manufacturers to combine their data. Rewards for contributing parties could then derive from a combination of training data and testing edge cases, as identified by regulators and other watchdog parties. Paco Nathan explains how decentralized data markets provide a means to resolve difficult problems when training machine learning models, especially for use cases with large shared risks. With components based on blockchain technologies—smart contracts, token-curated registries, DApps, voting mechanisms, etc.—decentralized data markets allow multiple parties to curate ML training datasets in ways that are transparent, auditable, and secure and allow equitable payouts that take social values into account. Paco explores open source libraries from Computable.io based on Ethereum, which are being used to develop data markets. These enable users to adjust trade-offs between decentralized and centralized characteristics as needed for specific business use cases and as indicated by ethical concerns. This addresses other areas of machine learning risk, such as in genomics, medical research, and financial credit scores, where proprietary interests and social needs often come into conflict.

Open source decentralized data markets for training AI in areas of large shared risk

Using ML to improve UX and literacy for young poets

Practical Zero-Knowledge Proof Concepts on Hyperledger Fabric

Machine learning at scale with Kubernetes

Tutorial: Kubeflow End-to-End: GitHub Issue Summarization

Machine Learning as Code: and Kubernetes with Kubeflow

Distributed TensorFlow on Hops