Lessons learned building serverless distributed systems


Talk Title	Lessons learned building serverless distributed systems
Speakers	Raj Rohit (Episource)
Conference	O’Reilly Velocity Conference
Conf Tag	Build Resilient Distributed Systems
Location	London, United Kingdom
Date	October 18-20, 2017
URL	Talk Page
Slides	Talk Slides
Video

Episource just finished building a scalable, resilient serverless distributed data pipeline for coding medical charts using NLP, which scales seamlessly with the amount of data it takes in as input. Raj Rohit explores the system and the tools used to build it, such as Ansible, Lambda, and Terraform, and shares the pitfalls, failures, successes, and lessons learned along the way. The system uses a queue and AWS CloudWatch alarms for the serverless batch processing pipeline and Ansible for the master-workers architecture. Because AWS Lambda has a time limit of five minutes and Episource’s entire NLP pipeline flow takes 30 minutes to complete, the system created a master server via Lambda and runs Ansible in nohup mode. And since Ansible can terminate the workers once the tasks are completed, but Episource also wanted to delete the master, the Ansible playbook was built to kill the master once the workers are terminated. Raj’s team also built a serverless API for querying the results of the data pipeline, using AWS Lambda and API Gateway. All this have was built to comply with HIPAA regulations, which means that the data needs to be encrypted both at rest and in motion.

Lessons learned building serverless distributed systems

Real-world serverless architecture and engineering with AWS

Serverless - Is It For Your Organization?

Practical, team-focused operability techniques for distributed systems

Scale CI from 20K to 140K builds per day

Serverless in production: An experience report

How Shutterfly migrated 10+ billion photos to the cloud