On November 4, 2024, Anthropic’s Claude 3.5 Haiku was officially released in Amazon Bedrock, echoing our earlier announcement of its impending arrival. This launch builds upon the advancements made four months ago with the introduction of Claude 3.5, which set a new standard for AI model intelligence while sustaining the performance and affordability of its predecessor, Claude 3 Sonnet.
Today, I’m thrilled to unveil three innovative capabilities for the Claude 3.5 model family within Amazon Bedrock:
Enhanced Claude 3.5 Sonnet
The upgraded Claude 3.5 Sonnet model is now available, enhancing the strengths of its forerunner and delivering even greater intelligence without increasing costs. This enhanced model excels in tackling real-world software engineering challenges and can adeptly navigate complex workflows. With improved capabilities across the entire software development lifecycle—from initial design to debugging and maintenance—the upgraded Claude 3.5 Sonnet is poised to create more sophisticated chatbots with a friendly, human-like tone. It also shines in various applications, including knowledge Q&A platforms, visual data extraction from charts and diagrams, and the automation of repetitive tasks.
Computer Interaction
Claude 3.5 Sonnet introduces computer interaction capabilities in a public beta, enabling it to perceive and manipulate computer interfaces. Developers can instruct Claude to operate computers similarly to humans—by observing screens, maneuvering cursors, clicking buttons, and typing text. This is facilitated through integrated tools that execute computer actions, such as keystrokes and mouse clicks, editing text files, and running shell commands. By building an action-execution layer and granting screen access to Claude 3.5 Sonnet, software developers can create applications that perform computer operations, follow multi-step processes, and verify results. This opens up new avenues for AI-enhanced applications, such as automating software testing and back-office duties, as well as developing more advanced software assistants capable of interacting with various applications. Given that this technology is still in its infancy, we recommend developers to experiment with lower-risk tasks within a sandbox environment.
Claude 3.5 Haiku
The new Claude 3.5 Haiku model merges swift response times with enhanced reasoning capabilities, making it ideal for tasks that demand both speed and intelligence. Claude 3.5 Haiku not only improves upon its predecessor but also matches the performance of Claude 3 Opus, which was previously the largest model in the lineup. This model is well-suited for use cases like rapid and accurate code suggestions, highly interactive customer service chatbots, e-commerce solutions, and educational platforms. For customers handling extensive unstructured data in sectors such as finance and healthcare, Claude 3.5 Haiku can efficiently process and categorize vast amounts of information.
Anthropic reports that the upgraded Claude 3.5 Sonnet demonstrates substantial improvements over its predecessor, especially in coding performance—a domain in which it had already excelled. It shows significant gains across various industry benchmarks, including a jump in performance on the SWE-bench Verified from 33% to 49%, outperforming all publicly available models. Additionally, it improved its performance on the TAU-bench tool use task, increasing scores from 62.6% to 69.2% in the retail domain and from 36.0% to 46.0% in the airline domain.
Computer interaction, a new frontier in AI, expands beyond traditional API usage. Claude has been trained in general computer skills, enabling it to operate diverse standard tools and software programs. Developers can leverage this capability to translate prompts—like “find me a hotel in Rome”—into specific computer commands (e.g., opening a browser and navigating a website).
Developers now have access to three new integrated tools that give Claude a virtual set of hands for computer operation:
- Computer Tool – This tool can take a screenshot and a goal as input, returning a description of the necessary mouse and keyboard actions to achieve that goal. For instance, it can instruct to move the cursor, click, type, or take screenshots.
- Text Editor Tool – This enables the model to perform actions such as viewing file contents, creating new files, replacing text, and undoing edits.
- Bash Tool – This tool executes commands on a computer system for user-level interactions through terminal entry.
These tools unlock vast opportunities for automating intricate tasks—from data analysis and software testing to content creation and system administration. Imagine an application powered by Claude 3.5 Sonnet that interacts with a computer just like a human, navigating through multiple desktop tools including terminals, text editors, internet browsers, and capable of form-filling and debugging code.
We are eager to assist software developers in exploring these new capabilities through Amazon Bedrock. We anticipate rapid improvements in this capability over the coming months; however, Claude’s current ability to use computers does have limitations. Some actions, such as scrolling, dragging, or zooming, present challenges, thus we encourage the exploration of lower-risk tasks.
In the context of OSWorld, a benchmark for multimodal agents in real computer environments, the upgraded Claude 3.5 Sonnet currently achieves a score of 14.9%. While human-level skills far surpass this at around 70-75%, this performance is significantly better than the 7.7% achieved by the next-best model in the same category.
To start using the upgraded Claude 3.5 Sonnet in the Amazon Bedrock console, navigate to the console and select Model access from the navigation pane. There, you can request access to the new Claude 3.5 Sonnet V2 model.
To test the latest vision capabilities, I opened a new browser tab and downloaded the Wind Power Generation chart in PNG format from Our World in Data.
Back in the Amazon Bedrock console, I selected Chat/text under Playgrounds, chose Anthropic as the model provider, and opted for Claude 3.5 Sonnet V2. I then uploaded the image file from my computer using the three vertical dots in the input section and entered the prompt: “Which are the top countries for wind power generation? Please respond only in JSON.” The output successfully aligned with my instructions, extracting information from the image.
For those looking to integrate the upgraded Claude 3.5 Sonnet with AWS CLI and SDKs, here is a sample command using the Amazon Bedrock Converse API. This command utilizes the –query parameter to filter the output to display only the text content of the response message:
aws bedrock-runtime converse
--model-id anthropic.claude-3-5-sonnet-20241022-v2:0
--messages '[{ "role": "user", "content": [ { "text": "
For more insights and to keep the engagement going, check out this other blog post here. Also, refer to this authority on the topic for deeper understanding, and don’t miss this excellent resource here for additional context.
Leave a Reply