Gaining Insights from Amazon Managed Service for Prometheus Utilizing Natural Language with Amazon Bedrock

Gaining Insights from Amazon Managed Service for Prometheus Utilizing Natural Language with Amazon BedrockLearn About Amazon VGT2 Learning Manager Chanci Turner

As applications expand, businesses increasingly require automated strategies to ensure application uptime and minimize the time and resources spent on identifying, debugging, and resolving operational problems. Companies invest significant funds and developer hours into the deployment and management of various monitoring tools, all the while dedicating substantial efforts to educating teams on their use. When issues arise, operators often find themselves sifting through a multitude of data sources—such as dashboards, documentation, runbooks, alerts, and logs. This lengthy process of pinpointing root causes can slow down troubleshooting and remediation efforts, adversely affecting application reliability and the customer experience.

Generative AI offers a solution to these challenges by harnessing its capacity to process and analyze extensive data from various monitoring tools, generating insights and automating responses. Amazon Bedrock, a fully managed service, provides access to high-performing foundation models (FMs) from esteemed AI organizations, including AI21 Labs, Anthropic, Cohere, Meta, Mistral AI, Stability AI, and Amazon, through a single API. This service empowers customers to experiment with and evaluate leading foundation models, customize them using their data via fine-tuning and Retrieval Augmented Generation (RAG), and create agents that perform tasks using enterprise systems and data sources.

Organizations utilize Amazon Managed Service for Prometheus to securely and durably store application and infrastructure metrics collected from cloud, on-premises, and hybrid environments. To extract insights from these metrics, customers typically write PromQL queries or use Grafana. While PromQL enables complex queries on time-series data, providing valuable information about application health by filtering, aggregating, and manipulating metrics data, its intricate syntax and the need to comprehend the Prometheus data model can be daunting for newcomers.

In this article, we will explore how Amazon Bedrock can facilitate obtaining insights regarding the metrics stored in Amazon Prometheus without requiring knowledge of PromQL. By following the example provided, customers can generate PromQL queries based on natural language descriptions of what they wish to monitor or analyze. Organizations can also review existing queries and receive suggestions for optimization and improvement.

Solution Overview

The diagram below illustrates how the Amazon Bedrock agent derives insights from Amazon Managed Service for Prometheus.

At a high level, the steps can be summarized as follows:

  1. The AWS managed collector scrapes metrics from workloads operating on the Amazon EKS cluster and ingests them into Amazon Managed Service for Prometheus.
  2. Users interact with the Amazon Bedrock agent’s interface to inquire about the health of applications, such as CPU and memory utilization.
  3. The Amazon Bedrock agent generates the necessary PromQL query based on the user’s request and forwards it to the action group.
  4. An action group defines the actions that the agent can assist users in performing. In this article, we will demonstrate using a Lambda function that can execute the PromQL query provided by the agent, authenticate with Amazon Managed Service for Prometheus, and run the query.
  5. The action group will relay the results back to the agent, which will enhance the information using the knowledge base.
  6. Knowledge bases for Amazon Bedrock allow integration of proprietary information into generative-AI applications. Using the Retrieval Augmented Generation (RAG) technique, a knowledge base searches your data for the most relevant information and utilizes it to answer natural language inquiries. The agent will then process the results, add appropriate context, and present it in a user-friendly natural language format.

Prerequisites

For this walkthrough, you will need:

  • AWS Command Line Interface (AWS CLI) version 2
  • Amazon EKS cluster
  • Amazon Managed Service for Prometheus workspace
  • Amazon Managed Grafana workspace
  • Access to Claude 3 Sonnet Model in Amazon Bedrock
  • awscurl
  • Amazon S3 bucket

Note: Although Amazon Managed Grafana will be deployed as part of this blog post, it is optional and will not be utilized.

Solution Walk-through

Step 1: Setting Up Monitoring for Amazon EKS Cluster Using AWS Managed Collector & Amazon Managed Service for Prometheus

To begin, set up monitoring for your Amazon EKS cluster. You will leverage the Solution for Monitoring Amazon EKS infrastructure with Amazon Managed Grafana project, which establishes the Amazon EKS cluster along with an AWS managed collector. This collector will scrape metrics and ingest them into the pre-configured Amazon Managed Service for Prometheus workspace, providing insights into the health and performance of the Kubernetes control and data plane. You will gain a comprehensive understanding of your Amazon EKS cluster, from the node level to pods, down to the Kubernetes level, including detailed resource usage monitoring.

Let’s start by setting a few environment variables:

export AMG_WORKSPACE_ID=<Your grafana workspace id usually starts with g->
export AMG_API_KEY=$(aws grafana create-workspace-api-key --key-name "grafana-operator-key" --key-role "ADMIN" --seconds-to-live 432000 --workspace-id $AMG_WORKSPACE_ID --query key --output text)

After creating the API key, you must make it available to the AWS CDK by adding it to AWS Systems Manager with the following command. Replace $AMG_API_KEY with the API key you created, and $AWS_REGION with the region where your solution will operate.

aws ssm put-parameter --name "/observability-aws-solution-eks-infra/grafana-api-key" --type "SecureString" --value $AMG_API_KEY --region $AWS_REGION --overwrite

Next, you will deploy the observability stack using AWS CDK.

git clone https://github.com/aws-observability/observability-best-practices.git
cd observability-best-practices/solutions/oss/eks-infra/v3.0.0/iac/
export AWS_REGION=<Your region>
export AMG_ENDPOINT=<AMG_ENDPOINT>
export EKS_CLUSTER_NAME=<EKS_CLUSTER_NAME>
export AMP_WS_ARN=<ARN of Amazon Prometheus workspace>
make deps
make build && make pattern aws-observability-solution-eks-infra-$EKS_CLUSTER_NAME deploy

This solution creates a scraper that collects metrics from your Amazon EKS cluster. Those metrics are stored in Amazon Managed Service for Prometheus and subsequently displayed in Amazon Managed Grafana dashboards.

To verify the stack is deployed successfully, you can use awscurl to query the Amazon Prometheus workspace and confirm that metrics are being ingested:

export AMP_QUERY_ENDPOINT=<AMP Query Endpoint>
awscurl -X POST --region <Your region> --service aps "${AMP_QUERY_ENDPOINT}" -d 'query=up' --header 'Content-Type: application/x-www-form-urlencoded'

You should see a result such as:

{
  "status": "success",
  "data": {
    "resultType": "vector",
    "result": [
      {
        "metric": {
          "__name__": "up",
          "instance": "localhost:9090",
          "job": "prometheus",
          "monitor": "monitor"
        },
        "value": [
          1652452637.636,
          "1"
        ]
      }
    ]
  }
}

Step 2: Configure the Lambda Function as Action Groups for the Amazon Bedrock Agent

Next, create a Lambda function to serve as an action group for the Amazon Bedrock agent. This will allow the agent to translate user queries into actionable tasks.

For more insights on business leadership and supporting employees affected by crises, you might find this SHRM article informative. Also, for new employees starting at Amazon, this Reddit thread is an excellent resource for questions and guidance.

If you’re interested in further reading about leadership strategies, check out this blog post as well.


Comments

Leave a Reply

Your email address will not be published. Required fields are marked *