December 9, 2019

202 words 1 min read

Fast analytics on fast data: Kudu as a storage layer for banking applications

Fast analytics on fast data: Kudu as a storage layer for banking applications

Olaf Hein explains how a large German bank relies on a Kudu-based data platform to speed up business processes. Olaf highlights key data access patterns and the system architecture and shares best practices and lessons learned using Kudu in development and operations.


Talk Title	Fast analytics on fast data: Kudu as a storage layer for banking applications
Speakers	Olaf Hein (ORDIX AG)
Conference	Strata Data Conference
Conf Tag	Making Data Work
Location	London, United Kingdom
Date	May 22-24, 2018
URL	Talk Page
Slides	Talk Slides
Video

With HDFS and HBase, there are two different storage options available in the Hadoop ecosystem. Both have their strengths and weaknesses. However, neither HDFS nor HBase can be used universally for all kinds of workloads. Usually this leads to complex hybrid architectures. Kudu fills this gap and simplifies the architecture of big data systems. A large German bank is using a new data platform based on Kudu and Cloudera’s Enterprise Hadoop Distribution to speed up its credit processes. Within this system, financial transactions of millions of customers are analyzed by Spark jobs. In addition to this analytical workload, several frontend applications use the Kudu Java API to perform random reads and writes in real-time. Topics include:

api financial java spark ecosystem hadoop analytics hdfs big data cloud

comments powered by Disqus

How to protect big data in a containerized environment

How to protect big data in a containerized environment

December 9, 2019

Recent headline-grabbing data breaches demonstrate that protecting data is essential for every enterprise. The best-of-breed approach for big data is HDFS configured with Transparent Data Encryption (TDE), but TDE can be difficult to configure and manageissues that are only compounded when running on Docker containers. Thomas Phelan discusses these challenges and explains how to overcome them.

Making Big Data Processing Portable. The Story of Apache Beam and gRPC

Making Big Data Processing Portable. The Story of Apache Beam and gRPC

December 7, 2019

Big data applications have been an almost exclusive domain of Java and Scala developers. This not only frustrates engineers who prefer other languages and their ecosystems, but also impedes companies …

The ultimate data scientist's playground: Building a multipetabyte analytic infrastructure for cyber defense

The ultimate data scientist's playground: Building a multipetabyte analytic infrastructure for cyber defense

December 5, 2019

Lee Blum offers an overview of Verint's large-scale cyber-defense system built to serve its data scientists with versatile analytic operations on petabytes of data and trillions of records, covering the company's extremely challenging use case, decision considerations, major design challenges, tips and tricks, and the systems overall results.

How to protect big data in a containerized environment

How to protect big data in a containerized environment

November 25, 2019

Recent headline-grabbing data breaches demonstrate that protecting data is essential for every enterprise. The best-of-breed approach for big data is HDFS configured with Transparent Data Encryption (TDE). However, TDE can be difficult to configure and manageissues that are only compounded when running on Docker containers. Thomas Phelan explores these challenges and how to overcome them.

Playing well together: Big data beyond the JVM with Spark and friends

Playing well together: Big data beyond the JVM with Spark and friends

November 22, 2019

Holden Karau and Rachel Warren explore the state of the current big data ecosystem and explain how to best work with it in non-JVM languages. While much of the focus will be on Python + Spark, the talk will also include interesting anecdotes about how these lessons apply to other systems (including Kafka).

Smart agriculture: Blending IoT sensor data with visual analytics

Smart agriculture: Blending IoT sensor data with visual analytics

November 21, 2019

Mike Prorock offers an overview of mesur.io, a game-changing climate awareness solution that combines smart sensor technology, data transmission, and state-of-the-art visual analytics to transform the agricultural and turf management market. Mesur.io enables growers to monitor areas of concern, providing immediate benefits to crop yield, supply costs, farm labor overhead, and water consumption.