November 26, 2019

348 words 2 mins read

Distributed clinical models: Inference without sharing patient data

Distributed clinical models: Inference without sharing patient data

Clinical collaboration benefits from pooling data to train models from large datasets, but it's hampered by concerns about sharing data. Balasubramanian Narasimhan, John-Mark Agosta, and Philip Lavori outline a privacy-preserving alternative that creates statistical models equivalent to one from the entire dataset.

Talk Title Distributed clinical models: Inference without sharing patient data
Speakers Balasubramanian Narasimhan (Stanford University), John-Mark Agosta (Microsoft), Philip Lavori (Stanford University)
Conference Strata Data Conference
Conf Tag Big Data Expo
Location San Jose, California
Date March 6-8, 2018
URL Talk Page
Slides Talk Slides
Video

Previously, medical researchers who wanted to run a large, multi-institution study needed to create a central registrar of subjects’ personal data, collected from different institutions. Despite strict HIPPA compliance by cloud offerings such as Azure, such aggregated datasets are few and far between due to institutional barriers to sharing sensitive personal data. But statistical learning models need not have all their data exposed in one place. Equivalent models can be learned with message passing among distributed iterative algorithms that just communicate aggregate values. Balasubramanian Narasimhan, John-Mark Agosta, and Philip Lavori outline a privacy-preserving alternative that creates statistical models equivalent to one from the entire dataset, implemented with a set of remote cloud applications that communicate with a master application to build a common model. The remote clouds form a star-shaped network that exchange partial results asynchronously with the master until convergence. The distributed cloud application provides rapid assembly of collaborative computational projects that wrap flexible and extensible R statistical software. It works across a heterogeneous collection of database environments, where the data can be stored either in local instances of the cloud or left on-premises. The implementation in Azure has full transparency to allow local officials concerned with privacy protections to validate the safety of the method. Security between remote and master sites builds on OAuth-style distributed authentication so that each site runs under local control, as a separate tenant. Using Azure as a development framework, a single installer can spin up the set of cloud resources for the application instance and handle security and network configuration details as well.

comments powered by Disqus