[TALK]@Telematika

cluster

2000 Nodes and Beyond: How We Scaled Kubernetes to 60,000-Container Clusters and Where We're Going Next

January 2, 2020

Kubernetes supports 2000-Node clusters - that statement was a part of the Kubernetes 1.3 release announcement. Thats great, but what exactly does it mean? During this talk I will explain what work …

Analytics at ING: Technology solutions to create a real-time, data-driven bank

January 2, 2020

Bas Geerdink explains why and how ING is becoming more and more data-driven, sharing use cases, architecture, and technology choices along the way.

Apache Spark ML and MLlib tuning and optimization: A case study on boosting the performance of ALS by 60x

January 2, 2020

Apache Spark ML and MLlib are hugely popular in the big data ecosystem, and Intel has been deeply involved in Spark from a very early stage. Peng Meng outlines the methodology behind Intel's work on Spark ML and MLlib optimization and shares a case study on boosting the performance of Spark MLlib ALS by 60x in JD.coms production environment.

Best practices with Kudu: An end-to-end user case from the automobile industry

January 2, 2020

Kudu is designed to fill the gap between HDFS and HBase. However, designing a Kudu-based cluster presents a number of challenges. Wei Chen and Zhaojuan Bian share a real-world use case from the automobile industry to explain how to design a Kudu-based E2E system. They also discuss key indicators to tune Kudu and OS parameters and how to select the best hardware components for different scenarios.

Lightning Talk - Kubernetes and Ceph Integration: From Deployment to Production

January 2, 2020

This talk presents recent status of Ceph and Kubernetes integration. Deploying Ceph Cluster on Kubernetes using DaemonSet significantly reduces the administrative overhead to get Ceph Cluster ready fo …

Scaling Microservices Beyond a Single Cluster with Kubernetes

January 2, 2020

All things fail, including clustered technologies that are designed for failure. Learn how Concur uses external load balancing and the existing k8s tools (pre-ubernetes) to provide cluster failure tol …

Automating Infrastructure Deployment for Kubernetes

January 1, 2020

Many organizations run Kubernetes clusters in a single public cloud like GCE or AWS, and as a result have reasonably homogenous infrastructure needs. In these situations deploying Kubernetes clusters …

Deploying a scalable JupyterHub environment for running Jupyter notebooks

January 1, 2020

Jupyter notebooks provide a rich interactive environment for working with data. Running a single notebook is easy, but what if you need to provide a platform for many users at the same time. Graham Dumpleton demonstrates how to use JupyterHub to run a highly scalable environment for hosting Jupyter notebooks in education and business.

Taking the Helm: Delivering Kubernetes-Native Applications

January 1, 2020

The typical workflow for delivering an application on top of Kubernetes involves managing a bunch of manifest files in your Git repositories, and writing new manifests usually means copying lots of bo …

High-performance enterprise data processing with Spark

December 31, 2019

Vickye Jain and Raghav Sharma explain how they built a very high-performance data processing platform powered by Spark that balances the considerations of extreme performance, speed of development, and cost of maintenance.

Running Multiple Schedulers in Kubernetes

December 31, 2019

In this session we will talk about the mechanism of supporting multi schedulers in a Kubernetes cluster. First we will give an overview of multi-scheduler frameworks in various cluster management syst …

Cluster Federation in Kubernetes: Past, Present and the Future

December 30, 2019

In this session, I want to briefly present the current state of cluster federation in Kubernetes mainly focusing on what we aimed to accomplish, where we are today and where we want to go. After that …

R you ready for the cloud? Using R for operationalizing an enterprise-grade data science solution on Azure

December 30, 2019

R has long been criticized for its limitations on scalable data analytics. What's needed is an R-centric paradigm that enables data scientists to elastically harness cloud resources of manifold computing capability for large-scale data analytics. Le Zhang and Graham Williams demonstrate how to operationalize an E2E enterprise-grade pipeline for big data analyticsall within R.

Technical View: Comparison of Container Orchestration and Management Systems

December 30, 2019

The large-scale cluster orchestration management has been evolved to a new age which is represented by open source projects like Kubernetes working with containers like Docker. This presentation will …

18 Months Before the Mast

December 29, 2019

We first launched Kubernetes in production in June, 2015. By Kubecon of that year, we had the largest production cluster of any company in attendance. We'll share the development and operational lesso …

Managing a Multi-Tenanted Kubernetes Cluster in Production

December 29, 2019

Kubernetes clusters dedicated to a single organization are becoming common, either run by the organizations that use them or hosted by others. Less common is a multi-tenant use of a single cluster. T …

Migrating Configuration to Kubernetes with Container-Transform

December 29, 2019

Kubernetes has accelerated application development time for many organizations but one of the most tedious aspects of moving from application prototypes to running pods on Kubernetes is the repetitive …

An architecture for merging fast data and enterprise applications: The SMACK stack

December 28, 2019

Big data architectures and enterprise/microservice architectures are slowly converging. Big data is transitioning to "fast data," emphasizing streaming over batch processing, while data processing is growing ubiquitous. Dean Wampler explores the SMACK stackSpark, Mesos, Akka, Cassandra, and Kafkaand explains how it addresses the needs of both fast data and the enterprise.

Enter the Matrix, Exploring Your Kubernetes Cluster in Virtual Reality

December 28, 2019

This is a combination of fun hack + potentially real-world use-case (sometime in the future). The idea is to use WebVR and a Kubernetes API client to render a Kubernetes cluster in a Virtual Reality e …

Learning How to Pronounce Kubernetes to Production in 3 Months!

December 28, 2019

Outline: - Show how easy it was to go from not knowing what a container is to production with Kubernetes - Show some of the interesting ways we are autoscaling our microservices based on load - Descr …

Lightning Talk - Say what? You're Running the Storage Platform IN Kubernetes?

December 28, 2019

GlusterFS is an open source, scale out, distributed filesystem that is becoming popular as a shared storage solution for containers. This talk is about how the GlusterFS community containerized Gluste …

Torus: Focusing Storage for Kubernetes

December 28, 2019

If Kubernetes can orchestrate computation across any cluster, on any cloud, how can we do the same for orchestrating storage? Further, can storage for Kubernetes be easily managed by Kubernetes? CoreO …

POSIX for the data center

December 26, 2019

The container orchestration wars are upon us. A dozen container orchestrators vie to be the kernel of the modern data center. But can the warring parties come together on a standard interface for modern cluster operations? Karl Isenberg explores what these parties have in common and outlines what a common interface might look like for operating these distributed operating systems.

Building a powerful data tier from open source datastores

December 19, 2019

In the past few years, there has been a proliferation of production-ready open source databases, giving developers and operators more choices than ever. Joseph Lynch explores how Yelp has combined complimentary data stores to provide a powerful data tier for our developers. Along the way, Joseph shares lessons learned about deployment, configuration, and monitoring from a production environment.

A practitioners guide to securing your Hadoop cluster

December 16, 2019

Many Hadoop clusters lack even basic security controls. Michael Yoder, Ben Spivey, Mark Donsky, and Mubashir Kazia walk you through securing a Hadoop cluster. You'll start with a cluster with no security and then add security features related to authentication, authorization, encryption of data at rest, encryption of data in transit, and complete data governance.

Authorization in the cloud: Enforcing access control across compute engines

December 16, 2019

Li Li and Hao Hao elaborate the architecture of Apache Sentry + RecordService for Hadoop in the cloud, which provides unified, fine-grained authorization via role- and attribute-based access control, to encourage attendees to adopt Apache Sentry and RecordService to protect sensitive data on the multitenant cloud across the Hadoop ecosystem.

Beyond Hadoop at Yahoo: Interactive analytics with Druid

December 16, 2019

Himanshu Gupta explains why Yahoo has been increasingly investing in interactive analytics and how it leverages Druid to power a variety of internal- and external-facing data applications.

Breaking Spark: The top five mistakes to avoid when using Apache Spark in production

December 15, 2019

Drawing on his experiences across 150+ production deployments, Neelesh Srinivas Salian focuses on five common issues observed in a cluster environment setup with Apache Spark (Core, Streaming, and SQL) to help you improve the usability and supportability of Apache Spark and avoid such issues in future deployments.