December 1, 2019

264 words 2 mins read

Organizing the data lake

Organizing the data lake

Building a data lake involves more than installing and using Hadoop. The goal in most organizations is to build multiuse data infrastructure that is not subject to past constraints. Mark Madsen discusses hidden design assumptions, reviews design principles to apply when building multiuse data infrastructure, and provides a reference architecture.

Talk Title Organizing the data lake
Speakers Mark Madsen (Teradata)
Conference Strata Data Conference
Conf Tag Making Data Work
Location London, United Kingdom
Date May 23-25, 2017
URL Talk Page
Slides Talk Slides
Video

Building a data lake involves more than installing and using Hadoop. The focus in the market has been on all the different technology components, ignoring the more important part: the data architecture that the code implements, which lies at the core of the system. Just like a data warehouse, a data lake has a data architecture. If you expect any longevity from the platform, the architecture should be designed rather than accidental. But what are the design principles that lead to good functional design and a workable data architecture? What are the assumptions that limit old approaches? How can one integrate with or migrate from the older environments? How does this affect an organization’s data management? Answering these questions is key to building long-term infrastructure. The goal in most organizations is to build multiuse data infrastructure that is not subject to past constraints. Mark Madsen discusses hidden design assumptions, reviews design principles to apply when building multiuse data infrastructure, and provides a reference architecture. This reference architecture has been used across many organizations to work toward a unified analytic infrastructure.

comments powered by Disqus