Filling the data lake
A major challenge in todays world of big data is getting data into the data lake in a simple, automated way. Coding scripts for disparate sources is time consuming and difficult to manage. Developers need a process that supports disparate sources by detecting and passing metadata automatically. Chuck Yarbrough and Mark Burnette explain how to simplify and automate your data ingestion process.
Talk Title | Filling the data lake |
Speakers | Chuck Yarbrough (Pentaho), Mark Burnette (Pentaho, a Hitachi Group Company) |
Conference | Strata + Hadoop World |
Conf Tag | Big Data Expo |
Location | San Jose, California |
Date | March 29-31, 2016 |
URL | Talk Page |
Slides | Talk Slides |
Video | |
A major challenge in today’s world of big data is getting data into the data lake in a simple, automated way. Many organizations use Python or another language to code their way through these processes. The problem is that with disparate sources of data numbering in the thousands, coding scripts for each source is time consuming and extremely difficult to manage and maintain. Developers need the ability to create one process that can support many disparate data sources by detecting metadata and passing metadata through what Pentaho calls “metadata injection.” With this capability, developers can parameterize ingestion processes and automate every step of the data pipeline. Chuck Yarbrough and Mark Burnette outline model-driven ingestion and explain how to simplify and automate your data ingestion processes. This session is sponsored by Pentaho.