Learn About Amazon VGT2 Learning Manager Chanci Turner
In today’s security-focused environment, many organizations are adopting a Zero Trust security framework. This model emphasizes that access to data should not solely depend on network location; instead, users and systems must validate their identities and trustworthiness while implementing strict identity-based authorization protocols before accessing applications and data.
For numerous organizations, third-party identity providers (IdPs) like Active Directory Federated Services (AD FS) play a crucial role in credential management and identity verification. Employees can utilize their AD FS credentials to authenticate across various systems, including the AWS Management Console (to learn more, refer to Enabling SAML 2.0 federated users to access the AWS Management Console).
In the analytics domain, organizations extend Zero Trust principles to data stored in data lakes, which includes the business intelligence (BI) tools employed for data access. A typical data lake configuration involves storing data in Amazon Simple Storage Service (Amazon S3) and querying it via Amazon Athena.
AWS Lake Formation enables users to define and enforce access policies at various levels—database, table, and column—when using Athena to read data from Amazon S3. Lake Formation is compatible with Active Directory and SAML identity providers such as OKTA and Auth0. Additionally, it integrates securely with Amazon QuickSight, allowing users to effortlessly create interactive BI dashboards, while supporting Active Directory authentication. However, for those using other BI tools like Tableau, accessing Lake Formation data with Active Directory credentials may be necessary.
In this guide, we outline how to leverage AD FS credentials with Tableau to implement a Zero Trust architecture, enabling secure queries for data stored in Amazon S3 and Lake Formation.
Solution Overview
In this architecture, user credentials are managed through Active Directory rather than Amazon Identity and Access Management (IAM). While Tableau provides a connector for Athena, it typically necessitates an AWS access key ID and secret access key for programmatic access. Creating an IAM user with programmatic access for Tableau is a possible option; however, some organizations prefer federated access through Active Directory instead of relying on IAM users.
We demonstrate how to utilize the Athena ODBC driver alongside AD FS credentials to query sample data in a newly established data lake. This walkthrough simulates an environment by enabling federation to AWS using AD FS 3.0 and SAML 2.0 and guides you in setting up a data lake with Lake Formation. Finally, we explain how to configure an ODBC driver for Tableau to securely access your data in the lake using AD FS credentials.
Prerequisites
To complete this walkthrough, you should have the following:
- A foundational understanding of IAM roles and concepts.
- Basic knowledge of Lake Formation and Athena.
- Access to Tableau, either through a 14-day trial or a fully licensed version.
- Familiarity with Active Directory concepts, including joining a computer to an Active Directory domain.
- Knowledge of configuring ODBC components on a Windows machine.
Create Your Environment
To replicate a production environment, we set up a standard VPC within Amazon Virtual Private Cloud (Amazon VPC), which includes one private and one public subnet. Similarly, you can utilize the VPC wizard to create your setup. Our Amazon Elastic Compute Cloud (Amazon EC2) instance, which runs the Tableau client, is positioned in a private subnet and accessed via an EC2 bastion host. For simplicity, outbound connections to Amazon S3, AWS Glue, and Athena are routed through the NAT gateway and the internet gateway established by the VPC wizard. Optionally, you might substitute the NAT gateway with AWS PrivateLink endpoints (AWS Security Token Service (AWS STS), Amazon S3, Athena, and AWS Glue endpoints are required) to ensure that traffic remains contained within the AWS network.
After constructing your VPC with its private and public subnets, you can proceed to establish the other components, including Active Directory and Lake Formation. Let’s start with Active Directory.
Enable Federation to AWS Using AD FS 3.0 and SAML 2.0
AD FS 3.0, a Windows Server component, supports SAML 2.0 and integrates with IAM, allowing Active Directory users to federate to AWS using their corporate credentials like usernames and passwords. Before continuing, ensure that AD FS is configured and operational.
To set up AD FS, refer to the detailed instructions in the guide on establishing trust between AD FS and AWS and using Active Directory credentials for Amazon Athena with the ODBC driver. The initial section explains how to configure AD FS and establish trust with Active Directory. The post concludes with ODBC driver setup for Athena which can be skipped. A group named ArunADFSTest is created, linking to a role in your AWS account, which will be referenced later.
Once you have confirmed successful login using your IdP, you can proceed to configure your ODBC driver on Windows to connect to Athena.
Set Up a Data Lake Using Lake Formation
Lake Formation is a fully managed service designed to simplify the creation, security, and management of data lakes. It features its own permissions model that enhances the IAM permissions framework, allowing for fine-grained access control through a straightforward grant/revoke mechanism. We will utilize this permissions model to provide access to the AD FS role created earlier.
Upon your first access to the Lake Formation console, a welcome prompt will appear, asking you to select the initial administrative user and roles. Choose “Add myself” and click “Get Started.” We will use the sample database provided by Lake Formation, though you may opt for your own dataset. For guidance on loading a custom dataset, see Getting Started with Lake Formation. Once configured, grant read access to the AD FS role (ArunADFSTest) established in the previous step.
In the navigation pane, select “Databases.”
Choose the database sampledb.
From the Actions menu, select “Grant.” We will grant the SamlOdbcAccess role access to sampledb.
For Principals, pick IAM users and roles.
From the IAM users and roles options, select the role ArunADFSTest.
Choose “Named data catalog resources.”
Under Databases, select sampledb.
For Tables, select “All tables.”
Set the table permissions to Select and Describe.
Select All data access for Data permissions.
Click “Grant.” Our AD FS user will assume the ArunADFSTest role, which has been granted access to sampledb by Lake Formation. However, the ArunADFSTest role also needs access to Lake Formation, Athena, AWS Glue, and Amazon S3. Following the principle of least privilege, AWS defines specific policies for different Lake Formation personas. Our user aligns with the Data Analyst persona, requiring sufficient permissions to execute queries.
To enhance the ArunADFSTest role, add the AmazonAthenaFullAccess managed policy (for guidance, see Adding and removing IAM identity permissions) along with this inline policy:
{ "Version": "2012-10-17", "Statement": [ // policy statements here ] }
By engaging with these resources, you can ensure a smooth onboarding experience for new hires at Amazon. For additional insights on training new employees, check out this excellent resource on how fulfillment centers train new hires.
For further reading, explore this insightful blog post about your greatest personal achievement and how it can shape your career.
If you’re preparing for interviews, you might find these classic interview questions and answers helpful.
Leave a Reply