Cloud architectures for data science
December 27, 2019
Data is available from an incredible number of sources in an endless number of formats. Data science deals with the extraction of valuable insights from this jumble in the form of attractive visualizations. Walking you through several examples using practical tools and tricks, Margriet Groenendijk presents a typical workflow that offers a basic introduction to data science.
Chainer: A flexible and intuitive framework for complex neural networks
December 6, 2019
Open source software frameworks are the key for applying deep learning technologies. Orion Wolfe and Shohei Hido introduce Chainer, a Python-based standalone framework that enables users to intuitively implement many kinds of other models, including recurrent neural networks, with a lot of flexibility and comparable performance to GPUs.
Sell cron, buy Airflow: Modern data pipelines in finance
November 29, 2019
Quantopian integrates financial data from vendors around the globe. As the scope of its operations outgrew cron, the company turned to Apache Airflow, a distributed scheduler and task executor. James Meickle explains how in less than six months, Quantopian was able to rearchitect brittle crontabs into resilient, recoverable pipelines defined in code to which anyone could contribute.
Robust anomaly detection for real user monitoring data
November 23, 2019
For the past year, LinkedIn has been running and iteratively improving Luminol, its anomaly detection system for real user monitoring data. Ritesh Maheshwari and Yang Yang offer an overview of Luminol, focusing on how to build a low-cost end-to-end system that can leverage any algorithm, and explain lessons learned and best practices that will be useful to any engineering or operations team.
Fluent Python: Implementing intuitive and productive APIs
November 12, 2019
Python is so consistent that we can often infer the behavior of new objects by assuming they work like the built-ins. The Python data model is the foundation of this consistent behavior. Luciano Ramalho explores the construction of Pythonic objects: classes that feel "natural" to a Python programmer and leverage some of the best language features by implementing key protocols of the data model.
Navigating the data science Python ecosystem
November 7, 2019
Python's popularity for data science use cases has skyrocketed in recent years due to its ease of use, great developer and user community, and solid core of scientific libraries. Christine Doig explores data science and the state of the Python ecosystem and helps navigate the large amount of open source libraries available for data science in Python, providing a map to guide you on the journey.
Docker for data scientists
October 25, 2019
Data scientists inhabit such an ever-changing landscape of languages, packages, and frameworks that it can be easy to succumb to tool fatigue. If this sounds familiar, you may have missed the increasing popularity of Linux containers in the DevOps world, in particular Docker. Michelangelo D'Agostino demonstrates why Docker deserves a place in every data scientists toolkit.
Filling the data lake
October 25, 2019
A major challenge in todays world of big data is getting data into the data lake in a simple, automated way. Coding scripts for disparate sources is time consuming and difficult to manage. Developers need a process that supports disparate sources by detecting and passing metadata automatically. Chuck Yarbrough and Mark Burnette explain how to simplify and automate your data ingestion process.
Python scalability: A convenient truth
October 21, 2019
Despite Python's popularity throughout the data-engineering and data science workflow, the principles behind its performance and scaling behavior are less understood. Travis Oliphant explains best practices and modern tools to scale Python to larger-than-memory and distributed workloads without sacrificing its ease of use or being forced to adopt heavyweight frameworks.