Authorization in the cloud: Enforcing access control across compute engines
Li Li and Hao Hao elaborate the architecture of Apache Sentry + RecordService for Hadoop in the cloud, which provides unified, fine-grained authorization via role- and attribute-based access control, to encourage attendees to adopt Apache Sentry and RecordService to protect sensitive data on the multitenant cloud across the Hadoop ecosystem.
Talk Title | Authorization in the cloud: Enforcing access control across compute engines |
Speakers | |
Conference | Strata + Hadoop World |
Conf Tag | Make Data Work |
Location | New York, New York |
Date | September 27-29, 2016 |
URL | Talk Page |
Slides | Talk Slides |
Video | |
Hadoop in the cloud is becoming an increasingly common use case, as the cloud provides rapid access to flexible and low-cost IT resources. Similar to traditional on-premises Hadoop clusters, data authorization becomes more crucial than ever for the multitenant cloud. A transparent solution that decouples compute and storage is required for a simple and smooth transition. And since the underlying data is shared across the components, a unified authorization policy should be enforced to adapt the flexibility of Hadoop ecosystem. Li Li and Hao Hao explore Apache Sentry and RecordService as a solution to address this problem. Apache Sentry is a framework to provide fine-grained authorization as a service, and RecordService is an abstraction layer between computing frameworks and data storage, which can leverage and enforce the Sentry centralized authorization policies. Li and Hao discuss the architecture of Apache Sentry and RecordService and how the fine-grained access control policies are uniformly enforced in different Hadoop components in the cloud, such as Hive, Solr, Impala, Kafka, Sqoop2, Spark, Pig, and MapReduce, with no performance loss. They also explain how Apache Sentry can leverage the benefits of both role-based access control (RBAC) and attribute-based access control (ABAC).