Your easy move to serverless computing and radically simplified data processing
February 7, 2020
Most analytic flows can benefit from serverless, starting with simple cases to and moving to complex data preparations for AI frameworks like TensorFlow. To address the challenge of how to easily integrate serverless without major disruptions to your system, Gil Vernik explores the push to the cloud experience, which dramatically simplifies serverless for big data processing frameworks.
Data science + design thinking: A perfect blend to achieve the best user experience
February 6, 2020
Design thinking is a methodology for creative problem-solving developed at the Stanford d.school. The methodology is used by world-class design firms like IDEO and many of the world's leading brands like Apple, Google, Samsung, and GE. Michael Radwin prepares a recipe for how to apply design thinking to the development of AI/ML products.
Executive Briefing: Unpacking AutoML
February 5, 2020
Paco Nathan outlines the history and landscape for vendors, open source projects, and research efforts related to AutoML. Starting from the perspective of an AI expert practitioner who speaks business fluently, Paco unpacks the ground truth of AutoMLtranslating from the hype into business concerns and practices in a vendor-neutral way.
Introducing Kubeflow (with special guests TensorFlow and Apache Spark)
February 4, 2020
Modeling is easyproductizing models, less so. Distributed training? Forget about it. Say hello to Kubeflow with Holden Karaua system that makes it easy for data scientists to containerize their models to train and serve on Kubernetes.
TFX: Production ML pipelines with TensorFlow
February 2, 2020
Putting together an ML production pipeline for training, deploying, and maintaining ML and deep learning applications is much more than just training a model. Robert Crowe explores Google's open source community TensorFlow Extended (TFX), an open source version of the tools and libraries that Google uses internally, made using its years of experience in developing production ML pipelines.
The moral responsibility of AI builders (sponsored by Dataiku)
February 2, 2020
With the adoption of AI in the enterprise accelerating, its impactsboth positive and negativeare rapidly increasing. Triveni Gandhi explores why the builders of these new AI capabilities all bear some moral responsibility for ensuring that their products create maximum benefit and minimal harm.
Building machine learning inference pipelines at scale
January 31, 2020
Real-life ML workloads require more than training and predicting: data often needs to be preprocessed and postprocessed. Developers and data scientists have to train and deploy a sequence of algorithms that collaborate in delivering predictions from raw data. Julien Simon outlines how to build machine learning inference pipelines using open source libraries and how to scale them on AWS.
End-to-end ML streaming with Kubeflow, Kafka, and Redis at scale
January 30, 2020
With ubiquitous ML models, model serving and pipelining is more important now. Comcast runs hundreds of models at scale with Kubernetes and Kubeflow. Together with other popular open source streaming platforms such as Apache Kafka and Redis, Comcast invokes models billions of times per day while maintaining high availability guarantees and quick deployments. Join Nick Pinckernell to learn how.
Machine learning vital signs: Metrics and monitoring of AI in production
January 27, 2020
Production artificial intelligence systems are interacting with the real world, and it's terrifying that oftentimes nobody has any idea how they're performing on live data. Donald Miner details why you should track your models in production over time, explains how you can implement proper logging and metrics for models, and details metrics you should probably be capturing.
Model as a service for real-time decisioning
January 27, 2020
Hosting models and productionizing them is a pain point. ML models used for real-time processing require data scientists to have a defined workflow giving them the agility to do self-service seamless deployments to production. Niraj Tank and Sumit Daryani detail open source technologies for building a generic service-based approach for servicing ML decisioning and achieving operational excellence.
Optimizing analytical queries on Cassandra by 100x
January 26, 2020
Cassandra is one of the most popular datastores in big data and ML applications. Data analysis at scale with fast query response is critical for business needs, and while Cassandra with Spark integration allows running an analytical workload, it can be slow. Shradha Ambekar dives into the challenges faced at Intuit and the solutions her team implemented to improve performance by 100x.
Overview of Data Governance
January 26, 2020
Paco Nathan offers an overview of its history, themes, tools, process, standards, and morepartly based on interviewing experts in this field about issues and best practices. Join in to learn what impact machine learning has on data governance and vice versa, along with an overview of open source projects and open standards in this space.
Unlocking your serverless functions with OpenFaaS for AI chatbot projects
January 23, 2020
Sergio Mendez examines critical challenges when implementing AI chatbots and explains how Movistar designed an open source serverless architecture using OpenFaaS on top of Kubernetes and other complementary technologies like NoSQL, brokers to deploy Telegram AI chatbots. Sergio then compares these technologies to "vendor lock-in" services offered by major cloud providers.
What's your machine learning score?
January 23, 2020
ML in production is different than ML in an R&D environment. Tania Allard dives deep into a number of techniques to test your ML quality and decay in your R&D and production environments appropriately. You'll see examples of issues commonly encountered in the ML area and how to test and monitor your data, model development, and infrastructure.
Data-driven digital transformation and jobs: The new software hierarchy and ML
January 13, 2020
Robert Cohen discusses the skills that employers are seeking from employees in digital jobs, linked to the new software hierarchy driving digital transformation. Robert describes this software hierarchy as one that ranges from DevOps, CI/CD, and microservices to Kubernetes and Istio. This hierarchy is used to define the jobs that are central to data-driven digital transformation.
Combining WrapFS and eBPF to Provide a Lightweight File System Sandboxing Framework
January 12, 2020
Filesystem (FS) sandboxing is a useful technique to protect sensitive data from untrusted binaries. However, existing approaches do not allow fine-grained control over policy enforcement (e.g., seccom …
Evaluating cybersecurity defenses with a data science approach
January 12, 2020
Cybersecurity analysts are under siege to keep pace with the ever-changing threat landscape. The analysts are overworked as they are bombarded with and burned out by the sheer number of alerts that they must carefully investigate. Brennan Lodge and Jay Kesavan explain how to use a data science model for alert evaluations to empower your cybersecurity analysts.
Executive Briefing: Overview of data governance
January 11, 2020
Effective data governance is foundational for AI adoption in enterprise, but it's an almost overwhelming topic. Paco Nathan offers an overview of its history, themes, tools, process, standards, and more. Join in to learn what impact machine learning has on data governance and vice versa.
The Lyft data platform: Now and in the future
January 5, 2020
Lyfts data platform is at the heart of the company's business. Decisions from pricing to ETA to business operations rely on Lyfts data platform. Moreover, it powers the enormous scale and speed at which Lyft operates. Mark Grover and Deepak Tiwari walk you through the choices Lyft made in the development and sustenance of the data platform, along with what lies ahead in the future.