Making architecture choices for small and big data problems
Not all data science problems are big data problems. Lots of small and medium product companies want to start their journey to become data driven. Nischal HP and Raghotham Sripadraj share their experience building data science platforms for various enterprises, with an emphasis on making the right architecture choices and using distributed and fault-tolerant tools.
Talk Title | Making architecture choices for small and big data problems |
Speakers | Nischal HP (omni:us), Raghotham Sripadraj (Ericsson) |
Conference | Strata + Hadoop World |
Conf Tag | Big Data Expo |
Location | San Jose, California |
Date | March 14-16, 2017 |
URL | Talk Page |
Slides | Talk Slides |
Video | |
Enterprises want to be data driven from the very beginning or want to join the race for data supremacy. Being data driven requires the system to store and process every single transaction and interaction the customer makes with the product, thus enabling the business to make better decisions. But storing, processing, and analyzing data comes with a cost. This cost is distributed across the choice of technology, infrastructure, and go-to-market strategy. Nischal HP and Raghotham Sripadraj share their experience building data science platforms for various enterprises, with an emphasis on making the right architecture choices for things such as databases, queues, caching mechanisms, distribution of the workload, underlying technology for machine learning and predicitive models, visualization, and prototyping. Nischal and Raghotham stress the importance of using distributed and fault-tolerant tools, which themselves come with the cost of managing the infrastructure (including, by implication, a dedicated team to monitor the infra). However, with small data, simple tools take you a long way. Many things can go unnoticed in building an end-to-end data science system, like the importance of logging, building a data pipeline that sends notifications to the required medium of communication, exposing data science as a service via APIs, or A/B testing for data science-backed feature releases when required. Only when the data science solution is in production does it power the organization the right way. When building data science products you should live by the motto “fail fast.” Nischal and Raghotham themselves have failed fast when making these choices, but in time they came to understand that adopting the latest and the coolest technology on the planet just for the sake of it is not the right thing to do.