Learn About Amazon VGT2 Learning Manager Chanci Turner
In various industries, data service teams are creating centralized platforms that enable shared access to datasets across multiple business units and teams. This approach simplifies data governance, reduces redundancy, and enhances data integrity. Often, these centralized data platforms are built using Amazon Simple Storage Service (Amazon S3).
A prevalent method for granting access to this data involves setting up cross-account IAM Users and IAM Roles, which allow direct access to datasets stored in S3 buckets. Permissions are enforced through S3 Bucket Policies or S3 Access Point policies, enabling granular access control at the bucket, prefix, and object levels.
To mitigate risks and unintended access, Access Analyzer for S3 can be utilized to identify S3 buckets shared with external identities within your trust zone (Account or Organization). Although Access Analyzer provides valuable insights at the bucket level, you may require auditing capabilities at the S3 prefix level since data is typically organized using prefixes.
Common Use Cases
Many organizations intake numerous third-party datasets and then distribute these datasets internally via a subscription model. Regardless of how data is ingested—whether through the AWS Transfer Family service or other means—these datasets are generally stored in a single S3 bucket, with distinct prefixes for each vendor’s dataset. The structure can be visualized as follows:
vendor-s3-bucket → vendorA-prefix → vendorA.dataset.csv → vendorB-prefix → vendorB.dataset.csv
Access is granted to data subscribers at the S3 prefix level. However, Access Analyzer for S3 does not provide visibility into S3 prefixes, necessitating the development of custom scripts to extract this information from S3 policy documents. Moreover, users will need the data in an accessible format, such as a CSV file that can be queried, filtered, and shared across the organization.
To meet this requirement, we outline a solution that utilizes S3 Access Analyzer findings to generate a CSV file on a pre-configured schedule. This solution encompasses:
- External Principals outside your trust zone that have access to your S3 buckets.
- Permissions granted to these external principals (read, write).
- A list of S3 prefixes with access granted to these external principals via S3 bucket policies and/or S3 access point policies.
Architecture Overview
The implementation of this solution involves the following steps:
- Pass the Access Analyzer ARN and S3 bucket parameters to an AWS Lambda function through environment variables.
- The Lambda function utilizes the Access Analyzer ARN to call the list-findings API, retrieving findings and storing them in the S3 bucket in JSON format.
- The Lambda function then parses the JSON file, extracting necessary fields and saving them as a CSV file in the same S3 bucket. It also examines the bucket policy and/or access point policies to gather S3 prefix level permissions granted to external identities, which are included in the CSV file.
- An AWS Glue crawler is set up during initial deployment to discover and create the schema of the CSV file in the AWS Glue Data Catalog.
- An Amazon Athena query is executed to create a downloadable spreadsheet of findings for auditing purposes.
Prerequisites
Before you begin, ensure you have the following:
- An AWS account.
- S3 buckets shared with external identities via cross-account IAM roles or IAM users. Follow the guidelines in this user guide to set up cross-account S3 bucket access.
- IAM Access Analyzer activated for your AWS account. Check the instructions to enable IAM Access Analyzer.
Once IAM Access Analyzer is enabled, you can view findings from the S3 console by selecting the bucket name and clicking on the ‘View findings’ box or accessing the findings directly on the IAM console. Selecting a ‘Finding id’ for an S3 bucket will display a relevant screen.
Setup
With your Access Analyzer operational, open the link below to deploy the CloudFormation template. Launch the template in the same AWS Region where IAM Access Analyzer has been enabled.
Launch template
Specify a stack name and input the following parameters:
- ARN of the Access Analyzer from the IAM Console.
- A new S3 bucket for storing your findings. The CloudFormation template will append a suffix to ensure uniqueness.
Proceed by selecting “Next” twice, and on the final screen, check the box allowing CloudFormation to create IAM resources before choosing “Create Stack.” The resource creation and AWS Lambda function launch will take a few minutes.
Once the stack status shows as CREATE_COMPLETE, navigate to the Outputs tab and note the value for the DataS3BucketName key. This is the S3 bucket generated by the template, formatted as analyzer-findings-xxxxxxxxxxxx. Access the S3 console to view the bucket contents, which should include folders for archive/ and report/. The report folder will contain the CSV file with the findings report.
You can download the CSV directly and open it in an Excel sheet for a view of the contents. If you wish to query the CSV based on various attributes, proceed to the AWS Glue console and click on Crawlers. An analyzer-crawler will have been created for you; select it to execute.
Upon successful execution of the crawler, a new table named analyzer-report will be created under the analyzerdb Glue database. Check the table properties and schema for details. To execute queries, go to the Athena console, select the analyzerdb database, and run a query like “Select * from analyzer-report where externalaccount = <>” to enumerate all S3 buckets accessible to the external account.
This CloudFormation template also generates a CloudWatch event rule, testanalyzer-ScheduledRule-xxxxxxx, which triggers the Lambda function every Monday to produce a new version of the findings CSV file. You can modify the rule frequency as desired.
Clean Up
To avoid incurring costs, ensure you delete the resources you created. First, manually remove the folders.
This process is essential for effective data governance and access management. For additional insights into work emotions, consider reading about languishing at Career Contessa. Furthermore, for resources on young professionals, you might find SHRM valuable. Lastly, for community-driven insights, visit Reddit, an excellent resource.
Leave a Reply