January 18, 2020

293 words 2 mins read

Lessons learned building serverless distributed systems

Lessons learned building serverless distributed systems

Episource just finished building a scalable, resilient serverless distributed data pipeline for coding medical charts using NLP, which scales seamlessly with the amount of data it takes in as input. Raj Rohit explores the system and the tools used to build it, such as Ansible, Lambda, and Terraform, and shares the pitfalls, failures, successes, and lessons learned along the way.

Talk Title Lessons learned building serverless distributed systems
Speakers Raj Rohit (Episource)
Conference O’Reilly Velocity Conference
Conf Tag Build Resilient Distributed Systems
Location London, United Kingdom
Date October 18-20, 2017
URL Talk Page
Slides Talk Slides
Video

Episource just finished building a scalable, resilient serverless distributed data pipeline for coding medical charts using NLP, which scales seamlessly with the amount of data it takes in as input. Raj Rohit explores the system and the tools used to build it, such as Ansible, Lambda, and Terraform, and shares the pitfalls, failures, successes, and lessons learned along the way. The system uses a queue and AWS CloudWatch alarms for the serverless batch processing pipeline and Ansible for the master-workers architecture. Because AWS Lambda has a time limit of five minutes and Episource’s entire NLP pipeline flow takes 30 minutes to complete, the system created a master server via Lambda and runs Ansible in nohup mode. And since Ansible can terminate the workers once the tasks are completed, but Episource also wanted to delete the master, the Ansible playbook was built to kill the master once the workers are terminated. Raj’s team also built a serverless API for querying the results of the data pipeline, using AWS Lambda and API Gateway. All this have was built to comply with HIPAA regulations, which means that the data needs to be encrypted both at rest and in motion.

comments powered by Disqus