Lessons learned building serverless distributed systems
Episource just finished building a scalable, resilient serverless distributed data pipeline for coding medical charts using NLP, which scales seamlessly with the amount of data it takes in as input. Raj Rohit explores the system and the tools used to build it, such as Ansible, Lambda, and Terraform, and shares the pitfalls, failures, successes, and lessons learned along the way.
Talk Title | Lessons learned building serverless distributed systems |
Speakers | Raj Rohit (Episource) |
Conference | O’Reilly Velocity Conference |
Conf Tag | Build Resilient Distributed Systems |
Location | London, United Kingdom |
Date | October 18-20, 2017 |
URL | Talk Page |
Slides | Talk Slides |
Video | |
Episource just finished building a scalable, resilient serverless distributed data pipeline for coding medical charts using NLP, which scales seamlessly with the amount of data it takes in as input. Raj Rohit explores the system and the tools used to build it, such as Ansible, Lambda, and Terraform, and shares the pitfalls, failures, successes, and lessons learned along the way. The system uses a queue and AWS CloudWatch alarms for the serverless batch processing pipeline and Ansible for the master-workers architecture. Because AWS Lambda has a time limit of five minutes and Episource’s entire NLP pipeline flow takes 30 minutes to complete, the system created a master server via Lambda and runs Ansible in nohup mode. And since Ansible can terminate the workers once the tasks are completed, but Episource also wanted to delete the master, the Ansible playbook was built to kill the master once the workers are terminated. Raj’s team also built a serverless API for querying the results of the data pipeline, using AWS Lambda and API Gateway. All this have was built to comply with HIPAA regulations, which means that the data needs to be encrypted both at rest and in motion.