How to protect big data in a containerized environment
Recent headline-grabbing data breaches demonstrate that protecting data is essential for every enterprise. The best-of-breed approach for big data is HDFS configured with Transparent Data Encryption (TDE). But TDE is difficult to configure and manageparticularly when run in Docker containers. Thomas Phelan discusses these challenges and explains how to overcome them.
|Talk Title||How to protect big data in a containerized environment|
|Speakers||Thomas Phelan (HPE BlueData)|
|Conference||Strata Data Conference|
|Conf Tag||Big Data Expo|
|Location||San Francisco, California|
|Date||March 26-28, 2019|
Every enterprise spends significant resources to protect its data. This is especially true in the case of big data, since some of this data may include sensitive or confidential customer and financial information. Common methods for protecting data include permissions and access controls, as well as the encryption of data at rest and in flight. The Hadoop community has recently rolled out Transparent Data Encryption (TDE) support in HDFS. Transparent Data Encryption refers to the process whereby data is transparently encrypted by the big data application writing the data; it isn’t decrypted again until it’s accessed by another application. The data is encrypted during its entire lifespan—in transit and at rest—except when it’s being specifically accessed by a processing application. TDE is an excellent approach for protecting data stored in data lakes built on the latest versions of HDFS. However, it comes with its own challenges and limitations. Systems that want to use TDE require tight integration with enterprise-wide Kerberos Key Distribution Center (KDC) services and Key Management Systems (KMS). This integration isn’t easy to set up or maintain. These issues can be even more challenging in a virtualized or containerized environment where one Kerberos realm may be used to secure the big data compute cluster and a different Kerberos realm may be used to secure the HDFS file system accessed by this cluster. BlueData has developed significant expertise in configuring, managing, and optimizing access to TDE-protected HDFS. Thomas Phelan offers a detailed description of how Transparent Data Encryption works with HDFS, with a particular focus on containerized environments. You’ll learn how HDFS TDE is configured and maintained in an environment where many big data frameworks run simultaneously (e.g., in a hybrid cloud architecture using Docker containers). You’ll also discover how to manage KDC credentials in a Kerberos cross-realm environment to provide data scientists and analysts with the greatest flexibility in accessing data while maintaining complete enterprise-grade data security.