Federated learning
Imagine building a model whose training data is collected on edge devices such as cell phones or sensors. Each device collects data unlike any other, and the data cannot leave the device because of privacy concerns or unreliable network access. This challenging situation is known as federated learning. Mike Lee Williams discusses the algorithmic solutions and the product opportunities.
Talk Title | Federated learning |
Speakers | Mike Lee Williams (Cloudera Fast Forward Labs) |
Conference | Strata Data Conference |
Conf Tag | Big Data Expo |
Location | San Francisco, California |
Date | March 26-28, 2019 |
URL | Talk Page |
Slides | Talk Slides |
Video | |
Federated learning is distributed machine learning across edge devices with a number of twists that make it both challenging and broadly applicable. Training happens on the same devices that generate the data. Those edge users are often concerned about privacy and are thus unwilling to share their training data. And even when they’re willing to share the data, communication is unreliable and slow, so it may not be practical. Examples include predictive text on cell phones, a person’s engagement with their own photos, and machine learning in the browser applied to corporate text archives such as a team Slack or Google Drive, and ML on low-powered field devices in energy, agriculture, and logistics. The principles of data minimization established by the GDPR and the prevalence of smart sensors makes these use cases more common and the advantages of federated learning more compelling. Mike Lee Williams discusses the algorithmic and production techniques of federated learning and the privacy-preserving, fault-tolerant product opportunities they offer. He then leads a demo of a working prototype example of federated learning applied to a predictive maintenance problem, in which customers aren’t willing to share the details of how their components failed with the manufacturer but want the manufacturer to provide them with a strategy to maintain the part. The solution satisfies the customer’s privacy concerns while providing them with a model that leads to fewer costly failures and less maintenance downtime.