Amazon Redshift is a cloud-based data warehouse service that is fully managed and can handle petabyte-scale data. Starting with just a few hundred gigabytes, you can effortlessly scale your data storage to accommodate a petabyte or even more. This capability allows you to leverage your data for deeper insights and better decision-making for your business and clientele.
As enterprises increasingly opt for Amazon Redshift as their data warehouse solution, there is a growing need for robust fine-grained access controls. These controls are essential for determining who can view which rows of sensitive data based on user profiles. Additionally, many organizations prefer to access Amazon Redshift through their existing identity provider (IdP) while ensuring compliance and security requirements are met. Without integrated features to enforce row-level security and secure authentication, companies may resort to additional solutions such as creating views or integrating third-party tools, which can lead to inefficiencies.
With the rollout of row-level security (RLS) in Amazon Redshift, organizations can now limit user access on a per-row basis. Furthermore, the introduction of native IdP functionality streamlines the implementation of authentication and authorization with chosen business intelligence (BI) tools.
Amazon Redshift’s row-level security mechanisms provide detailed access control over sensitive datasets. RLS policies are utilized to dictate which rows are returned in query results.
In this article, we will guide you through an example of implementing row-level security in Amazon Redshift using existing IdP credentials, simplifying both authentication and permission management. This flexible approach ensures complete control over data access while leveraging your current IdP for authorization.
Solution Overview
In our scenario, an organization requires row-level security to limit access to sales performance data based on specific states and their assigned sales representatives. Below are the defined business rules:
- Mia, the salesperson for NY, should only have access to sales data from NY.
- Oliver, the salesperson for CA, should only be able to view sales data from CA.
- Sophia, the sales manager for the North America region, should have access to sales data from all states.
- Liam, who is part of the HR department, should not have access to any sales data.
The following diagram illustrates the architecture of the solution, which combines Amazon Redshift row-level security with native IdP authentication.
The implementation involves several steps:
- Create RLS policies to enforce detailed access control on the Sales table.
- Establish Amazon Redshift roles corresponding to the various Azure AD groups and assign the necessary permissions to the table.
With native IdP, roles can be created automatically based on Azure groups. However, as a best practice, we recommend pre-creating Amazon Redshift roles and assigning the relevant permissions.
- Attach RLS policies to these roles.
- Configure a JDBC or ODBC driver in your SQL client to use Azure AD federation, utilizing Azure AD login credentials for access.
- Upon successful authentication, Azure AD will issue an authentication token (OAuth token) to the Amazon Redshift driver.
- The driver sends this token to the Amazon Redshift cluster to initiate a new database session, which Amazon Redshift validates.
- Amazon Redshift then queries the Azure Graph API to retrieve the user’s group memberships.
- The logged-in Azure AD user is mapped to the corresponding Amazon Redshift user, and Azure AD groups are associated with Amazon Redshift roles.
- The pre-mapped roles allow the authorized users to query the fine-grained row-level access data from their client.
Prerequisites
To implement this solution, you will need the following:
- An AWS account, which can be created if you don’t already have one.
- An Amazon Redshift Serverless or provisioned cluster. For setup details, refer to the relevant documentation.
- An active Microsoft Azure account with administrative privileges to configure the application.
- Power BI Desktop version 2.102.683.0 64-bit or newer installed.
- The latest Amazon Redshift JDBC SDK driver libraries downloaded and the Amazon Redshift JDBC JAR .zip folder unzipped. Note that versions prior to JDBC 2.1.0.4 do not support native IdP.
- Any SQL client; for this post, we utilize SQL Workbench/J.
Implementing Your Amazon Redshift Native IdP
To set up your Amazon Redshift native IdP, refer to another blog post, Integrate Amazon Redshift native IdP federation with Microsoft Azure AD using a SQL client. Follow the steps to create your Azure application and gather the necessary Azure AD information for the Amazon Redshift IdP.
In this example, we created four groups in Azure AD:
- sales_ny
- sales_ca
- sales_manager
- hr_group
We also created the following users in Azure AD:
- Mia – Salesperson for NY
- Oliver – Salesperson for CA
- Sophia – North America sales manager
- Liam – HR group member
Each user was added to the appropriate group as follows:
- Mia – sales_ny
- Oliver – sales_ca
- Sophia – sales_manager
- Liam – HR
Next, we need to register the IdP in Amazon Redshift using this command:
CREATE IDENTITY PROVIDER rls_idp TYPE
azure NAMESPACE 'aad'
PARAMETERS '{
"issuer":"https://sts.windows.net/87f4aa26-78b7-410e-bf29-57b39929ef9a/",
"audience":["https://analysis.windows.net/powerbi/connector/AmazonRedshift",
"api://991abc78-78ab-4ad8-a123-zf123ab03612p"],
"client_id":"123ab555-a321-666d-7890-11a123a44890",
"client_secret":"KiG7Q~FEDnE.VsWS1IIl7LV1R2BtA4qVv2ixB" }'
;
In this statement, the type “azure” indicates the provider is for Microsoft Azure AD communication. The parameters gather Azure AD information (for more details, check out this resource, as they are an authority on this topic).
- issuer: The identifier trusted for token reception.
- client_id: The public identifier of the registered application.
- client_secret: A secure identifier known only to the IdP and the registered application.
- audience: The application ID assigned within Azure. When connecting Amazon Redshift with Power BI Desktop and SQL Workbench/J, specific audience values need to be used.
To view the registered IdP in Amazon Redshift, use the following command:
DESC IDENTITY PROVIDER rls_idp;
For further insights, check out this excellent resource on the Amazon Flex onboarding process.
Leave a Reply