October 25, 2019

235 words 2 mins read

Filling the data lake

Filling the data lake

A major challenge in todays world of big data is getting data into the data lake in a simple, automated way. Coding scripts for disparate sources is time consuming and difficult to manage. Developers need a process that supports disparate sources by detecting and passing metadata automatically. Chuck Yarbrough and Mark Burnette explain how to simplify and automate your data ingestion process.

Talk Title Filling the data lake
Speakers Chuck Yarbrough (Pentaho), Mark Burnette (Pentaho, a Hitachi Group Company)
Conference Strata + Hadoop World
Conf Tag Big Data Expo
Location San Jose, California
Date March 29-31, 2016
URL Talk Page
Slides Talk Slides
Video

A major challenge in today’s world of big data is getting data into the data lake in a simple, automated way. Many organizations use Python or another language to code their way through these processes. The problem is that with disparate sources of data numbering in the thousands, coding scripts for each source is time consuming and extremely difficult to manage and maintain. Developers need the ability to create one process that can support many disparate data sources by detecting metadata and passing metadata through what Pentaho calls “metadata injection.” With this capability, developers can parameterize ingestion processes and automate every step of the data pipeline. Chuck Yarbrough and Mark Burnette outline model-driven ingestion and explain how to simplify and automate your data ingestion processes. This session is sponsored by Pentaho.

comments powered by Disqus