R you ready for the cloud? Using R for operationalizing an enterprise-grade data science solution on Azure
R has long been criticized for its limitations on scalable data analytics. What's needed is an R-centric paradigm that enables data scientists to elastically harness cloud resources of manifold computing capability for large-scale data analytics. Le Zhang and Graham Williams demonstrate how to operationalize an E2E enterprise-grade pipeline for big data analyticsall within R.
|R you ready for the cloud? Using R for operationalizing an enterprise-grade data science solution on Azure
|Le Zhang (Microsoft), Graham Williams (Microsoft)
|Strata + Hadoop World
|Make Data Work
|December 6-8, 2016
R is leading in the list of most popular data science languages, with 49% share of the overall voting according to a recent survey. However, the language is by nature limited in scalability and parallelism, and thus, restrained for wide deployment in enterprise-grade applications. Contemporary big data solutions are migrating from on-premises to the cloud, owing to apparent benefits of flexibility in scaling up/out resources, computational efficiency, and cost effectiveness. To better leverage the advantages of cloud computing and smooth the process of embracing the cloud, the community needs R packages as well as associated paradigms that allow R-user data scientists and data engineers to operationalize enterprise-grade pipeline for analytical solution development. Le Zhang and Graham Williams demonstrate how to use R for architecting enterprise-grade data analytic solutions and developing artificial intelligence applications on Azure cloud. Le and Graham explore a real-world scenario about flight delay prediction to illustrate how R is used to elastically deploy, manage, and deallocate a heterogeneous set of cloud instances, such as virtual machine, Spark clusters, and storage accounts, and distribute on-demand parallel and scalable data analytics with the cutting-edge machine learning technologies in the cloud. The R packages introduced remarkably simplify the management and use of cloud resources for various big data tasks and therefore accelerate the pace of prototyping, experimenting, and productizing data-driven solutions for enterprise use.