February 11, 2020

250 words 2 mins read

Orchestrating data workflows using a fully serverless architecture

Orchestrating data workflows using a fully serverless architecture

Use of data workflows is a fundamental functionality of any data engineering team. Nonetheless, designing an easy-to-use, scalable, and flexible data workflow platform is a complex undertaking. Tomer Levi walks you through how the data engineering team at Fundbox uses AWS serverless technologies to address this problem and how it enables data scientists, BI devs, and engineers move faster.


Talk Title	Orchestrating data workflows using a fully serverless architecture
Speakers	Tomer Levi (Fundbox)
Conference	Strata Data Conference
Conf Tag	Make Data Work
Location	New York, New York
Date	September 24-26, 2019
URL	Talk Page
Slides	Talk Slides
Video

Fundbox is a growing fintech company that provides an automatic underwriting platform based on data and AI. While scheduling a limited number of data workflows is a generally manageable task, scaling to hundreds of data workflows with dependencies and diverse job types requires substantial customized engineering, complexity, and overall expensive resources. Serverless-based architectures offer an alternative to traditional resource management. Tomer Levi explains how the data engineering team at Fundbox uses AWS Step Functions, Docker containers, and Spark to build a live, serverless data orchestration platform, focusing on the company’s decision to build a friendly, yet powerful and scalable solution. Tomer details AWS Step Functions state machines, their limitations, and how to overcome them by building custom job-scheduling and dependency features. He illustrates how resource bottlenecks were overcome using Docker containers and AWS Fargate. Fundbox’s architecture is scalable and already serves dozens of engineers, BI developers, and data scientists in the company.

container management serverless spark data engineering bi complexity aws docker fintech scalable orchestration

comments powered by Disqus

Cloud Native Smart Contract with Knative

Cloud Native Smart Contract with Knative

November 25, 2019

Smart contract in blockchain carries out business logics by manipulating data in ledger. Hyperledger Fabric, a permissioned blockchain technology, manages lifecycle of smart contracts by building and …

Spark on Kubernetes for data science

Spark on Kubernetes for data science

February 10, 2020

Spark on Kubernetes is a winning combination for data science that stitches together a flexible platform harnessing the best of both worlds. Jordan Volz gives a brief overview of Spark and Kubernetes, the Spark on Kubernetes project, why its an ideal fit for data scientists who may have been dissatisfied with other iterations of Spark in the past, and some applications.

Deep learning with TensorFlow and Spark using GPUs and Docker containers

Deep learning with TensorFlow and Spark using GPUs and Docker containers

January 12, 2020

Organizations need to keep ahead of their competition by using the latest AI, ML, and DL technologies such as Spark, TensorFlow, and H2O. The challenge is in how to deploy these tools and keep them running in a consistent manner while maximizing the use of scarce hardware resources, such as GPUs. Thomas Phelan discusses the effective deployment of such applications in a container environment.

Serverless workflows for orchestration hybrid cluster-based and serverless processing

Serverless workflows for orchestration hybrid cluster-based and serverless processing

December 21, 2019

Serverless implementation of core processing is quickly becoming a production-ready solution. However, companies with existing processing pipelines may find it hard to go completely serverless. Serverless workflows unite the serverless and cluster worlds, with the benefits of both approaches. Rustem Feyzkhanov demonstrates how serverless workflows change your perception of software architecture.

Open Source Weave Ignite - The GitOps VM

Open Source Weave Ignite - The GitOps VM

November 28, 2019

Weave Ignite is a new open source tool that combines Firecracker microVMs with OCI images, containerd and CNI to unify containers and VMs. It integrates with Kubernetes and GitOps operators so it can …

Accelerating Your Cloud Native DevOps with Oracle Linux and VirtualBox

Accelerating Your Cloud Native DevOps with Oracle Linux and VirtualBox

October 9, 2019

How do you simplify your application development and deployment with a curated set of open source software selected from the Cloud Native Computing Foundation projects such as Kubernetes, Docker, Kat …