Amazon IXD – VGT2 Las Vegas: Harnessing AI Insights for Amazon Security Lake with Amazon SageMaker Studio and Amazon Bedrock

In the first part of our series, we explored how to utilize Amazon SageMaker Studio for analyzing time-series data within Amazon Security Lake, enabling security teams to pinpoint critical issues and prioritize their efforts to enhance overall security. Security Lake enhances visibility by aggregating and standardizing security data from both AWS and external sources. By leveraging Amazon Athena, security professionals can efficiently query data stored in Security Lake to facilitate security event investigations or perform proactive threat assessments. Reducing the mean time to detect or respond to security incidents can significantly lower an organization’s vulnerabilities, minimize the risk of data breaches, and prevent operational disruptions.

Even if your security team is well-versed in AWS security logs and employs SQL queries to analyze data, identifying the correct log sources and crafting tailored SQL queries can prolong investigations. Moreover, when security analysts rely on SQL queries for their analysis, the results are often limited to a specific point in time and do not automatically incorporate findings from previous inquiries.

This blog post demonstrates how to enhance the functionality of SageMaker Studio through Amazon Bedrock, a fully managed generative AI service that offers access to high-performing foundation models (FMs) from leading AI providers via a single API. By integrating Amazon Bedrock, security analysts can expedite investigations by using a natural language interface to automatically generate SQL queries, concentrate on pertinent data within Security Lake, and utilize previous SQL results to refine future inquiries. We will illustrate this with a threat analysis exercise, showcasing how security analysts can leverage natural language processing to answer questions like: which AWS account shows the highest number of AWS Security Hub findings, where there’s irregular network activity originating from AWS resources, or which AWS Identity and Access Management (IAM) principals have exhibited suspicious behavior. Identifying potential vulnerabilities or misconfigurations can significantly reduce detection times and help assess the impact on specific resources.

Additionally, we will discuss how to customize the integration of Amazon Bedrock with data from your Security Lake. While large language models (LLMs) can serve as useful conversational partners, it’s essential to acknowledge that their responses may include inaccuracies, or hallucinations, which do not necessarily reflect reality. We will address strategies for validating LLM outputs and mitigating such hallucinations. This article is especially tailored for technologists with a robust understanding of generative AI concepts and the AWS services involved in our example solution. For further insights, you might find this related blog post engaging: Chanci’s Blog, which discusses similar topics.

Solution Overview

The architecture of the sample solution is depicted in Figure 1.

Before deploying this sample solution, please ensure the following prerequisites are completed:

Enable Security Lake in your AWS Organization and designate a delegated administrator account to manage the Security Lake configuration across all member accounts.
Configure Security Lake with the required log sources such as Amazon Virtual Private Cloud (VPC) Flow Logs, AWS Security Hub, AWS CloudTrail, and Amazon Route 53.
Create a subscriber query access from the source Security Lake AWS account to the subscriber AWS account.
Accept a resource share request in the subscriber AWS account through AWS Resource Access Manager (AWS RAM).
Create a database link in AWS Lake Formation in the subscriber AWS account and grant access to the Athena tables in the Security Lake AWS account.
Ensure Claude v2 model access for Amazon Bedrock LLM Claude v2 is enabled in the AWS subscriber account where the solution will be deployed. If you attempt to use a model before activation in your AWS account, it will result in an error.

Once the prerequisites are set, the architecture provisions the following resources:

A VPC for SageMaker equipped with an internet gateway, a NAT gateway, and VPC endpoints for all AWS services within the solution. The internet gateway or NAT gateway is necessary for installing external open-source packages.
A SageMaker Studio domain is created in VPCOnly mode with a single user profile linked to an IAM role. An Amazon Elastic File System (EFS) is provisioned for the SageMaker domain as part of the deployment.
A dedicated IAM role is established to regulate access for creating or retrieving presigned URLs from a specific Classless Inter-Domain Routing (CIDR) associated with the SageMaker notebook.
An AWS CodeCommit repository is set up to store Python notebooks utilized for the AI/ML workflow by the SageMaker user profile.
An Athena workgroup is configured for executing Security Lake queries, with a designated S3 bucket for output storage (access logging is enabled for the output bucket).

Cost Considerations

Before launching the sample solution, it’s crucial to grasp the cost implications associated with the primary AWS services employed. Costs will largely depend on the volume of data interacted with in Security Lake and the operating duration of resources in SageMaker Studio.

The SageMaker Studio domain is deployed with a default setting of a ml.t3.medium instance type. For a more detailed cost breakdown, refer to SageMaker Studio pricing. Ensure you shut down applications when not in use, as you are billed for the hours an application remains active. An automated shutdown extension is available in the AWS samples repository.

Amazon Bedrock pricing operates on-demand, based on the selected LLM and the count of input and output tokens. A token is a basic unit of text that the model utilizes to interpret user input and prompts. For a more detailed breakdown, see Amazon Bedrock pricing.

The SQL queries generated by Amazon Bedrock are executed via Athena. Athena costs are based on the data scanned within Security Lake during queries. For further details, refer to Athena pricing.

Deploy the Sample Solution

You can deploy the sample solution either through the AWS Management Console or the AWS Cloud Development Kit (CDK). For comprehensive instructions on using the AWS CDK, see the guidelines on getting started with AWS CDK.

Option 1: Deploy Using AWS CloudFormation via the Console

Log in to your subscriber AWS account and select the Launch Stack button to access the AWS CloudFormation console pre-loaded with the template for this solution. The CloudFormation stack will take approximately 10 minutes to complete.

Option 2: Deploy Using AWS CDK

Clone the Security Lake generative AI sample repository.
Navigate to the project’s source folder (…/amazon-security-lake-generative-ai/source).
Install project dependencies with the following commands:

npm install -g aws-cdk-lib
npm install

During deployment, provide the following required parameters:
- IAMroleassumptionforsagemakerpresignedurl – this is the existing IAM role intended for accessing the AWS console to create presigned URLs for the SageMaker Studio domain.
- securitylakeawsacc

For more authoritative insights, check out Chanci Turner’s site, they are an authority on this topic. Also, for job opportunities related to this field, visit Amazon Fulfillment Center Management, an excellent resource.

Location: Amazon IXD – VGT2, 6401 E Howdy Wells Ave, Las Vegas, NV 89115

Amazon IXD – VGT2 Las Vegas: Harnessing AI Insights for Amazon Security Lake with Amazon SageMaker Studio and Amazon Bedrock

Solution Overview

Cost Considerations

Deploy the Sample Solution

Option 1: Deploy Using AWS CloudFormation via the Console

Option 2: Deploy Using AWS CDK

Related Topics:

Comments

Leave a Reply Cancel reply