Cloud architectures for data science
Data is available from an incredible number of sources in an endless number of formats. Data science deals with the extraction of valuable insights from this jumble in the form of attractive visualizations. Walking you through several examples using practical tools and tricks, Margriet Groenendijk presents a typical workflow that offers a basic introduction to data science.
Talk Title | Cloud architectures for data science |
Speakers | |
Conference | O’Reilly Software Architecture Conference |
Conf Tag | Engineering the Future of Software |
Location | San Francisco, California |
Date | November 14-16, 2016 |
URL | Talk Page |
Slides | Talk Slides |
Video | |
Data science is currently a hot topic, but what is it? There are several definitions and opinions. Data science covers the complete workflow from defining a question, finding the most suitable data source, identifying the right tools, and presenting the best possible answer in a clear, engaging manner. Using weather data, geographical data, and UN country statistical data—all open datasets that are publicly available for download—Margriet Groenendijk walks you through an example of a typical workflow: defining the question, finding the data, exploring the data and finding the best tools for the analysis, cleaning and storing the data, and visualizing and summarizing the cleaned data. This work is quite often done iteratively, with each iteration informed by a growing understanding of the data through munging and crunching. Margriet concludes by highlighting some of the latest tools and tricks available to data scientists. More data is now easily accessible through REST APIs, making it even simpler to store and analyze (big) data in the cloud using tools such as Spark, Python notebooks, or Scala notebooks. These new developments make collaborating easy by allowing data scientists to easily share their data and analyses.