March 3, 2020

290 words 2 mins read

Evolution of a modern cloud-based data lake

Evolution of a modern cloud-based data lake

Building a data lake is a hard task. You have to centralize all the data of the company in one place, it must be easily accessible, and governance has to be done right. And, last but not least, the price has to stay reasonable. All those aspects come up as quite a challenge. But never fear. Viacheslav Inozemtsev outlines the experience of building Zalando's data lake.

Talk Title Evolution of a modern cloud-based data lake
Speakers Viacheslav Inozemtsev (Zalando)
Conference O’Reilly Software Architecture Conference
Conf Tag Engineering the Future of Software
Location Berlin, Germany
Date November 5-7, 2019
URL Talk Page
Slides Talk Slides
Video

Viacheslav Inozemtsev outlines the experience of building and evolving the cloud-based data lake of a company as large as Zalando. In particular, he addresses the three main areas of ingestion of data from all the various sources in the company, easy and convenient access to data, and security and governance at the scale of more than 100 teams. He also explores the issue of cost throughout all three parts. The first challenge—ingestion of data—is a broad topic on its own. Viacheslav examines the evolution of Zalando’s ingestion pipelines from different company-wide data sources, such as messaging bus, data warehouse, Google Analytics platform, as well as custom datasets on demand. For the second challenge—access to data—you’ll learn the evolution of the tools and principles Zalando developed to give the rest of the company convenient means to consume data and extract information from it. The largest challenge is security and governance, although it doesn’t bring any value directly. Viacheslav explores how the company addressed security and access management in the first place and how it evolved them later when better frameworks and services appeared on the market.

comments powered by Disqus