Lessons Learned from the Migration to Apache Airflow
Apache Airflow is an open-source tool for orchestrating complex workflows and data processing pipelines.In this talk, Radek Maciaszek will present his learnings from the migration of machine learning …
Talk Title | Lessons Learned from the Migration to Apache Airflow |
Speakers | Radek Maciaszek (Chief Architect, Skimlinks) |
Conference | Open Source Summit + ELC North America |
Conf Tag | |
Location | San Diego, CA, USA |
Date | Aug 19-23, 2019 |
URL | Talk Page |
Slides | Talk Slides |
Video | |
Apache Airflow is an open-source tool for orchestrating complex workflows and data processing pipelines.In this talk, Radek Maciaszek will present his learnings from the migration of machine learning and big data processing pipelines to Apache Airflow.Radek will discuss examples of how are they using Airflow to power their company big data infrastructure where they analyze hundreds of terabytes of data. Examples will cover the building of the ETL pipeline and use of Airflow to manage the machine learning Spark pipeline workflow.This talk will cover the basic Airflow concepts and show real-life examples of how to define your own workflows in the Python code. The talk will finish with more advanced topics related to Apache Airflow, such as adding custom task operators, sensors and plugins as well as best practices and both the pros and cons of this tool.