December 7, 2019

221 words 2 mins read

When one data center is not enough: Building large-scale stream infrastructures across multiple data centers with Apache Kafka

When one data center is not enough: Building large-scale stream infrastructures across multiple data centers with Apache Kafka

You may have successfully made the transition from single machines and one-off solutions to large, distributed stream infrastructures in your data center. But what if one data center is not enough? Ewen Cheslack-Postava explores resilient multi-data-center architecture with Apache Kafka, sharing best practices for data replication and mirroring as well as disaster scenarios and failure handling.


Talk Title	When one data center is not enough: Building large-scale stream infrastructures across multiple data centers with Apache Kafka
Speakers
Conference	Strata + Hadoop World
Conf Tag	Make Data Work
Location	New York, New York
Date	September 27-29, 2016
URL	Talk Page
Slides	Talk Slides
Video

To manage the ever-increasing volume and velocity of data within your company, you may have successfully made the transition from single machines and one-off solutions to large, distributed stream infrastructures in your data center powered by Apache Kafka. But what’s to be done if one data center is not enough? Ewen Cheslack-Postava explores resilient multi-data-center architecture with Apache Kafka, sharing best practices for data replication and mirroring as well as disaster scenarios and failure handling. Ewen covers four scenarios—replication and failover for disaster recovery, data produced in one location but consumed in another, aggregate cluster for data analysis, and bidirection relication—discussing the requirements for each, providing a proven architecture, and explaining the benefits and limitations of the solution.

kafka apache data-center large-scale infrastructure recovery data center slack cluster

comments powered by Disqus

Apache Eagle: Secure Hadoop in real time

Apache Eagle: Secure Hadoop in real time

November 21, 2019

Apache Eagle is an open source monitoring solution to instantly identify access to sensitive data, recognize malicious activities, and take action. Arun Karthick Manoharan, Edward Zhang, and Chaitali Gupta explain how Eagle helps secure a Hadoop cluster using policy-based and machine-learning user-profile-based detection and alerting.

Jump-starting back-office connections

Jump-starting back-office connections

November 26, 2019

Large-scale cloud networks are constantly driven by the need for improved performance in communication between data centers. Such back-office communication makes up a large fraction of traffic in many cloud environments. Harkeerat Bedi offers an overview of a tool that improves the efficiency of data-center-to-data-center communication by learning the congestion level of links in between.

HopsWorks: Multitenant Hadoop as a service

HopsWorks: Multitenant Hadoop as a service

November 18, 2019

Currently, multitenancy in Hadoop is limited to organizations running separate Hadoop clusters, and the secure sharing of resources is achieved using virtualization or containers. Jim Dowling describes how HopsWorks enables organizations to securely share a single Hadoop cluster using projects and a new metadata layer that enables protection domains while still allowing data sharing.

Inside Cigna's big data journey

Inside Cigna's big data journey

October 24, 2019

How do you implement Apache Hadoop in a large healthcare company with a mature data-analysis infrastructure? Jeffrey Shmain and Mohammad Quraishi describe Cigna's journey toward big data and Hadoop, including an overview of new Hadoop capabilities like heterogeneous data integration and large-scale machine learning.

IoT in the enterprise: A look at Intel (IoT) Inside

IoT in the enterprise: A look at Intel (IoT) Inside

October 23, 2019

Moty Fania shares Intels IT experience implementing an on-premises big data IoT platform for internal use cases. This unique platform was built on top of several open source technologies and enables highly scalable stream analytics with a stack of algorithms such as multisensor change detection, anomaly detection, and more.

Lessons learned building a scalable self-serve, real-time, multitenant monitoring service at Yahoo

Lessons learned building a scalable self-serve, real-time, multitenant monitoring service at Yahoo

October 23, 2019

Building a real-time monitoring service that handles millions of custom events per second while satisfying complex rules, varied throughput requirements, and numerous dimensions simultaneously is a complex endeavor. Sumeet Singh and Mridul Jain explain how Yahoo approached these challenges with Apache Storm Trident, Kafka, HBase, and OpenTSDB and discuss the lessons learned along the way.