large-scale

7 years of domain-driven design: Tackling complexity in large-scale marketing systems

March 5, 2020

Vladik Khononov explains how he and his team embraced domain-driven design (DDD) at Plexop, a large-scale marketing system that spans over a dozen different business domains. Join in to learn how DDD allowed the team to manage business complexities, see what worked (and what didn't), and discover where they had to adapt the DDD methodology to fit the company's needs.

Adopting domain-driven design at scale: Near enemies and how to defeat them

March 4, 2020

Everyone doing large-scale software delivery is using domain-driven design (DDD) these days, because it holds the key to delivering maintainable, evolvable solutions with independent teams. But it can go wrong, and then DDD is blamed. Andrew Harmel-Law and Gayathri Thiyagarajan detail a real project they saw fail. You'll learn the many problems they spotted and how they fixed them.

Cyclic Tests Unleashed: Large-Scale RT Analysis with Jitterdebugger

March 3, 2020

Jitterdebugger is a new tool for testing the preempt_rt real time extensions for the Linux kernel. While the basic principles for this endeavour (run a cyclic task on one or more CPUs, and store the m …

A GDPR retrospective: Implementation by a large-scale data organization in reality

February 29, 2020

GDPR was likely one of the biggest challenges in data management that occurred in 2018. Yulia Trakhtenberg dives into a one-year retrospective about how it was executed in reality at a large-scale data organization.

Internet Traffic 2009-2019

February 29, 2020

In 2009, we presented results from a large scale, multi-year study of global Internet traffic across 110 providers and 200 Exabytes of commercial traffic. Our orig …

Kubernetes the very hard way

February 27, 2020

Laurent Bernaille examines the lessons he learned operating large Kubernetes clusters.

Control BGP state explosion in Scale-out peering

February 25, 2020

Many large-scale service provider networks use some form of scale-out architecture at peering sites. In such an architecture, each participating Autonomous System …

Machine Learning Made Easy on Kubernetes. DevOps for Data Scientists

February 25, 2020

Though machine learning and AI are immensely powerful, these solutions are by no means easy. In many cases, there are many diverse components that are not designed to work together. Additionally, thes …

The power of good abstractions in systems design

February 25, 2020

This talk shows how good abstractions make it possible to identify and apply solutions to seemingly unrelated problems from different disciplines to build better systems with less effort.

Kubernetes Housekeeping

February 20, 2020

One of the big challenges of running large scale distributed systems like Kubernetes is managing resources. The efficiency and long term operational readiness of such systems depends on how well the r …

How to deploy large-scale distributed data analytics and machine learning on containers (sponsored by HPE)

February 19, 2020

Join Thomas Phelan to learn whether the combination of containers with large-scale distributed data analytics and machine learning applications is like combining oil and water or like peanut butter and chocolate.

Large-scale machine learning at Facebook: Implications of platform design on developer productivity

February 19, 2020

AI plays a key role in achieving Facebook's mission of connecting people and building communities. Nearly every visible product is powered by machine learning algorithms at its core, from delivering relevant content to making the platform safe. Kim Hazelwood and Mohamed Fawzy explain how applied ML has continued to change the landscape of the platforms and infrastructure at Facebook.

Architecting a data analytics service both in the public cloud and in the on-premise private cloud: ETL, BI, and machine learning (sponsored by SK Holdings)

February 16, 2020

Jungwook Seo walks you through a data analytics platform in the cloud by the name of AccuInsight+ with eight data analytic services in the CloudZ (one of the biggest cloud service providers in Korea), which SK Holdings announced in January 2019.

Deep learning from scratch

February 15, 2020

You'll go hands-on to learn the theoretical foundations and principal ideas underlying deep learning and neural networks. Bruno Gonalves provides the code structure of the implementations that closely resembles the way Keras is structured, so that by the end of the course, you'll be prepared to dive deeper into the deep learning applications of your choice.

Tested for Business: An Open and Transparent Quality Kit

February 15, 2020

With the proliferation of OpenJDK binaries for a business to choose from, one factor in determining the selection is quality. How do you know your choice is up to snuff? The AdoptOpenJDK Quality Kit …

How to deploy large-scale distributed data analytics and machine learning on containers (sponsored by HPE (BlueData))

February 12, 2020

Anant Chintamaneni and Matt Maccaux explore whether the combination of containers with large-scale distributed data analytics and machine learning applications is like combining oil and water or like peanut butter and chocolate.

Scalable anomaly detection with Spark and SOS

February 10, 2020

Jeroen Janssens dives into stochastic outlier section (SOS), an unsupervised algorithm for detecting anomalies in large, high-dimensional data. SOS has been implemented in Python, R, and, most recently, Spark. He illustrates the idea and intuition behind SOS, demonstrates the implementation of SOS on top of Spark, and applies SOS to a real-world use case.

Using Spark for crunching astronomical data on the LSST scale

February 8, 2020

The Large Scale Survey Telescope (LSST) is one of the most important future surveys. Its unique design allows it to cover large regions of the sky and obtain images of the faintest objects. After 10 years of operation, it will produce about 80 PB of data in images and catalog data. Petar Zecevic explains AXS, a system built for fast processing and cross-matching of survey catalog data.

What does the public say? A computational analysis of regulatory comments

February 8, 2020

While regulations affect your life every day, and millions of public comments are submitted to regulatory agencies in response to their proposals, analyzing the comments has traditionally been reserved for legal experts. Vlad Eidelman outlines how natural language processing (NLP) and machine learning can be used to automate the process by analyzing over 10 million publicly released comments.

Container orchestrator to DL workload, Bing's approach: FrameworkLauncher

February 6, 2020

Bing in Microsoft runs large, complex workflows and services, but there was no existing solutions that met its needs. So it created and open-sourced FrameworkLauncher. Kai Liu, Yuqi Wang, and Bin Wang explore the solution, built to orchestrate workloads on YARN through the same interface without changes to the workloads, including large-scale long-running services, batch jobs, and streaming jobs.

Running large-scale machine learning experiments in the cloud

February 3, 2020

Machine learning involves a lot of experimentation. Data scientists spend days, weeks, or months performing algorithm searches, model architecture searches, hyperparameter searches, etc. Shashank Prasanna breaks down how you can easily run large-scale machine learning experiments using containers, Kubernetes, Amazon ECS, and SageMaker.

Large-scale automated storage on Kubernetes

January 28, 2020

Managing large stateful applications is tough. Matt Schallert outlines the challenges of automating stateful systems at scale and details how embracing a declarative approach can ease operation and automation of these systems on orchestrators such as Kubernetes. He then demonstrates how to apply this methodology to different types of stateful workloads.

Microservices at the Edge - Best Practices

January 27, 2020

IoT Edge processing as an architectural component emerged to meet the demands of reducing network bandwidth requirements and lowering response latency. Microservices as a software paradigm evolved to …

(Self-driving technology and the future autonomous depot-to-depot transport)

January 22, 2020

PlusAI is developing a full stack self-driving technology to enable large-scale autonomous commercial fleets. Hao Zheng examines some of the unique challenges across different layers of the technology stack of building an autonomous truck that's both safe and efficient and dives into how PlusAI is addressing them.

Scaling teams with technology (or is it the other way around?)

January 19, 2020

Microservices and cloud native technologies is the path for building large-scale, distributed systems. Can it do the same for teams? Chen Goldberg leads the Google engineering team building Kubernetes, Istio, GKE, and Anthos and explains how the same tech can help build happy teams.

Security precognition: A look at chaos engineering in security incident response

January 19, 2020

Chaos engineering allows security incident response teams to proactively experiment on recurring incident patterns to derive new information about underlying factors that were previously unknown. Join Aaron Rinehart to explore the hidden costs of security incidents, learn a new technique for uncovering system weaknesses in systems security, and more.

Building resilient serverless systems

January 17, 2020

John Chapin explains howin this brave new world of managed services and platformsyou can use serverless technologies and an infrastructure-as-code mind-set to architect, build, and operate resilient systems that survive even massive vendor outages.

Keynote: Code and Clothes: How Open Data is Helping to Fix Fashion

January 15, 2020

The global fashion industry faces large-scale environmental and social impact challenges. Groups across the industry view increased supply chain transparency and the sharing of open data as a vital co …