Today, I’m excited to introduce Strands Agents, an open-source SDK designed to streamline the creation and operation of AI agents with minimal coding. This innovative tool is suitable for a wide range of agent applications, from basic to intricate use cases, and supports both local development and cloud deployment. Several teams at AWS are already leveraging Strands for their AI agents in production, including the Amazon X Developer, AWS Glue, and VPC Accessibility Analyzer. Now, I am delighted to share Strands with you for crafting your own AI agents.
Unlike frameworks that demand developers to construct complicated workflows for their agents, Strands simplifies the development process by utilizing advanced models to plan actions, connect thoughts, call tools, and reflect. With Strands, you only need to define a prompt and a list of tools in code to build an agent, test it locally, and deploy it to the cloud. Much like the two strands of DNA, Strands intertwines two essential components of the agent: the model and the tools. Strands manages the agent’s next actions and executes tools using the sophisticated reasoning abilities of the models. For more advanced agent scenarios, developers can tailor their agent’s behavior within Strands. This includes specifying tool selection methods, customizing context management, determining session state storage, and developing multi-agent applications. Strands is versatile enough to operate anywhere and can support any model with reasoning and tool utilization capabilities, including those from Amazon Bedrock, Anthropic, Ollama, Meta, and other providers via LiteLLM.
Strands Agents is fostered by an open community, and we’re thrilled to have several companies joining us in supporting and contributing to this initiative, including Accenture, Anthropic, Langfuse, mem0.ai, Meta, PwC, Ragas.io, and Tavily. For example, Anthropic has contributed enhancements for utilizing models through their API, while Meta has provided support for Llama models via their API. You can find more information in this related blog post here!
Our Journey in Agent Development
I primarily work on Amazon X Developer, a generative AI-powered assistant for software development. My team and I began developing AI agents in early 2023, coinciding with the release of the original ReAct (Reasoning and Acting) scientific paper. This research illustrated that large language models could reason, plan, and take actions within their environments. For instance, LLMs could deduce the necessity of making an API call to finalize a task and subsequently generate the required inputs for that API call. We quickly recognized that large language models could be employed as agents to accomplish various tasks, from complex software development to operational troubleshooting.
At that time, LLMs were typically not trained for agent-like behavior; they were primarily geared towards natural language processing. Effectively using an LLM for reasoning and acting necessitated complicated prompt instructions, parsers for the model’s outputs, and orchestration logic. Achieving syntactically correct JSON from LLMs was a significant challenge back then! To prototype and deploy agents, my team and I relied on several intricate agent framework libraries to manage the necessary scaffolding and orchestration for the agents to succeed with these earlier models. Even with these frameworks, it could take us months of adjustments to prepare an agent for production.
Since then, we have witnessed a remarkable enhancement in the capabilities of large language models to reason and utilize tools for task completion. We realized that the complexity of orchestration was no longer needed because models now inherently possess tool-use and reasoning abilities. In fact, some of the agent framework libraries we had been utilizing began to hinder our efforts to fully exploit the capabilities of newer LLMs. Despite the significant improvements in LLMs, those advancements did not expedite the process of building and iterating on agents with the frameworks we were using; it still took months to ready an agent for production.
We initiated the development of Strands Agents to alleviate these complexities for our teams in X Developer. We discovered that leveraging the latest models to drive agents significantly shortened our time to market and enhanced the user experience compared to using convoluted orchestration logic. Where it used to take months for our teams to transition from prototype to production, we can now deploy new agents in days or weeks with Strands.
Core Concepts of Strands Agents
At its core, an agent comprises three components: 1) a model, 2) tools, and 3) a prompt. The agent utilizes these components to autonomously complete tasks, such as answering questions, generating code, planning vacations, or optimizing financial portfolios. In a model-driven approach, the agent dynamically directs its actions and employs tools to achieve its goals.
To create an agent with the Strands Agents SDK, you will define these three components in code:
- Model: Strands offers flexible model support, allowing you to use any model in Amazon Bedrock that supports tool use and streaming, models from Anthropic’s Claude family through their API, Llama models via the Llama API, Ollama for local development, and many other providers like OpenAI through LiteLLM. You can even create your own custom model provider using Strands.
- Tools: You can choose from thousands of available Model Context Protocol (MCP) servers to utilize as tools for your agent. Strands also provides over 20 pre-built example tools, including options for file manipulation, API requests, and AWS API interactions. You can easily designate any Python function as a tool using the Strands @tool decorator.
- Prompt: You will provide a natural language prompt that specifies the agent’s task, such as responding to a user’s question. You can also include a system prompt that outlines general instructions and expected behavior for the agent.
An agent interacts with its model and tools in a loop until it completes the task specified by the prompt. This agentic loop is fundamental to Strands’ functionality. The Strands agentic loop maximizes the impressive capabilities of LLMs, allowing them to reason, plan, and select tools effectively. In each iteration, Strands invokes the LLM with the prompt and agent context, along with a description of the tools available. The LLM can respond in natural language for the end user, outline a sequence of steps, reflect on prior actions, and/or select one or more tools to utilize. When the LLM makes a tool selection, Strands manages the execution and returns the result to the LLM. Once the LLM completes its task, Strands provides the final result.
In Strands’ model-driven framework, tools are crucial for customizing agent behavior. For example, tools can retrieve relevant documents from a knowledge base, call APIs, execute Python logic, or return static strings that contain additional model instructions. Tools also facilitate complex use cases in a model-driven environment. For instance, the Retrieve tool employs semantic search using Amazon Bedrock Knowledge Bases, enhancing the model’s planning and reasoning by retrieving relevant tools through semantic search. An internal agent at AWS, for example, has access to over 6,000 tools for selection!
If you’re interested in further exploring this topic, I recommend checking out this excellent resource.
Leave a Reply