Claude 3.5 Sonnet (June) (non-reasoning)

Anthropic's fastest model yet, balancing speed, cost, and next-gen features.

Claude 3.5 Sonnet (June) (non-reasoning)

The new flagship of the Claude 3.5 family, offering twice the speed of Opus, superior vision capabilities, and the innovative Artifacts feature for interactive content generation.

200k Context WindowVision/Image InputArtifacts FeatureFastest Claude ModelKnowledge Cutoff: Mar 2024Proprietary License

Anthropic's Claude 3.5 Sonnet marks a significant step forward in the Claude model family, positioning itself as the new mid-tier standard. Released in June 2024, it's not just an incremental update; it's a substantial architectural leap, designed to be the fastest and most intelligent Sonnet model to date. It operates at twice the speed of the previous high-end model, Claude 3 Opus, while being offered at a fraction of the cost. This blend of performance and price makes it a compelling choice for a wide range of applications, from complex, multi-step agentic workflows and code generation to real-time customer support and data interpretation.

The model's core proposition is delivering near-Opus level intelligence with superior speed. On internal benchmarks, it sets new standards for graduate-level reasoning, undergraduate-level knowledge, and coding proficiency, even outperforming Claude 3 Opus in some evaluations. Its intelligence score of 25 on the Artificial Analysis Intelligence Index places it as a highly capable model within its class, especially for a model not explicitly categorized for top-tier reasoning. This makes it a workhorse model, capable of handling sophisticated tasks that previously required more expensive, slower models.

Perhaps the most groundbreaking feature introduced with Claude 3.5 Sonnet is 'Artifacts.' This new capability transforms the user experience by creating a dedicated workspace alongside the chat interface. When the model generates content like code snippets, text documents, or even website designs, these 'artifacts' appear in a separate window. Users can then interact with, edit, and iterate on this content in real-time, seamlessly integrating the AI's output into their projects and workflows. This feature moves beyond simple text generation, creating a collaborative and dynamic environment for development and content creation.

From a practical standpoint, Claude 3.5 Sonnet maintains a generous 200,000-token context window, equivalent to about 150,000 words, allowing it to process and reason over vast amounts of information. Its pricing is set at $3.00 per million input tokens and $15.00 per million output tokens, a strategic point that makes it significantly cheaper than Opus. This pricing, combined with its enhanced vision capabilities—which surpass Opus in interpreting charts and transcribing text from imperfect images—solidifies its role as a powerful, cost-effective tool for scaling AI applications.

Scoreboard

Intelligence

25 (37 / 54)

Scores below the class average of 30 on the Artificial Analysis Intelligence Index, but sets new internal benchmarks for Anthropic, outperforming Claude 3 Sonnet and even Claude 3 Opus on some evaluations.

Output speed

Up to 60 tokens/s

Performance varies by provider. Google Vertex leads with an impressive 60 tokens/second, while Amazon Bedrock offers a more moderate 35 tokens/second.

Input price

$3.00 / 1M tokens

Somewhat expensive for its class, where the average input price is around $2.00. This reflects its premium performance characteristics.

Output price

$15.00 / 1M tokens

Also on the higher end, with a class average of $10.00. The 5:1 output-to-input price ratio is a critical factor for cost management.

Verbosity signal

N/A

Verbosity metrics are not yet available for this model. However, its cost structure incentivizes concise outputs.

Provider latency

0.72s TTFT

Google Vertex provides the best responsiveness with a time-to-first-token of 0.72s. Amazon Bedrock is slightly behind at 0.99s.

Technical specifications

Spec	Details
Model Owner	Anthropic
License	Proprietary
Context Window	200,000 tokens
Knowledge Cutoff	March 2024
Modalities	Text, Vision (Image Input)
Key Feature	Artifacts for interactive content generation
Input Pricing	$3.00 per 1M tokens
Output Pricing	$15.00 per 1M tokens
API Providers	Anthropic API, Amazon Bedrock, Google Cloud Vertex AI
Intended Use	Data analysis, code generation, complex workflows, content creation
Predecessor	Claude 3 Sonnet
Key Improvement	2x speed of Claude 3 Opus, enhanced intelligence

What stands out beyond the scoreboard

Where this model wins

Unmatched Speed: Operates at twice the speed of Claude 3 Opus, making it exceptionally well-suited for latency-sensitive and interactive applications like chatbots and agentic systems.
Cost-Performance Ratio: Delivers intelligence and capabilities that rival or exceed top-tier models like Opus but at a fifth of the price, offering outstanding value for complex tasks.
Advanced Vision Capabilities: Sets a new benchmark for vision tasks, outperforming previous models in accurately interpreting charts, graphs, and transcribing text from low-quality or distorted images.
Interactive Artifacts Feature: The unique Artifacts functionality creates a live workspace, allowing users to edit and build upon AI-generated content in real-time, fundamentally improving development and creative workflows.
Large and Usable Context: The 200k token context window enables deep analysis of large documents, extensive codebases, and long conversation histories, making it powerful for complex reasoning tasks.

Where costs sneak up

High Output-to-Input Cost Ratio: With output tokens costing five times more than input tokens, applications that generate lengthy, verbose responses can become unexpectedly expensive.
The 'Artifacts' Iteration Loop: The ease of iterating with the Artifacts feature can encourage more frequent regeneration and modification, leading to a higher volume of token usage than anticipated.
Vision Token Consumption: Analyzing images, especially high-resolution ones, consumes a variable and sometimes large number of tokens, which can quickly drive up costs in vision-heavy applications.
Full Context Window Temptation: Consistently utilizing the entire 200k token context window for every request is a costly habit. It's often unnecessary and can lead to significant budget overruns if not managed carefully.
Agentic Workflow Chaining: Building complex, multi-step agents that call the model repeatedly can accumulate costs rapidly, as each step involves both input and output tokens.

Provider pick

When selecting a provider for Claude 3.5 Sonnet, the decision is refreshingly straightforward. The list price for the model is identical across both major cloud platforms, Amazon Bedrock and Google Cloud Vertex AI. This removes price as a variable, allowing you to focus entirely on performance.

The benchmarks reveal a clear winner in speed and latency, making the choice less about trade-offs and more about your specific ecosystem needs and performance requirements.

Priority	Pick	Why	Tradeoff to accept
Lowest Latency & Highest Throughput	Google Vertex AI	Vertex is the clear performance leader, offering nearly double the output speed (60 t/s vs 35 t/s) and significantly lower latency (0.72s vs 0.99s TTFT). This results in a much snappier, more responsive user experience.	The only tradeoff is for teams heavily invested in the AWS ecosystem, who would need to manage integrations with the Google Cloud platform.
Best for AWS-Native Teams	Amazon Bedrock	For organizations already operating on AWS, Bedrock offers seamless integration, unified billing, and easy access alongside other AWS services. The convenience factor is its primary advantage.	You will be accepting a significant performance hit, with slower generation speeds and higher latency compared to the Vertex AI offering.
Best All-Around Experience	Google Vertex AI	Given the identical pricing, the superior performance of the Vertex AI implementation makes it the best choice for most new projects or for those who are platform-agnostic.	None, unless deep integration with AWS is a non-negotiable requirement for your project.
Cost-Conscious (at scale)	Tie	Both providers offer the same base price. However, the higher speed of Vertex AI could translate to lower costs in compute-time-sensitive architectures or allow for faster processing of batch jobs.	The primary cost driver will be your application's token usage, not the choice of provider.

Note: Performance metrics are based on public benchmarks from June 2024. Provider offerings and performance can change. Always verify current data before making a final decision. Pricing is identical at $3.00/1M input and $15.00/1M output tokens on both platforms.

Real workloads cost table

Understanding the cost of Claude 3.5 Sonnet in practice requires looking at its 5:1 output-to-input price ratio. Tasks heavy on generation will be more expensive than those focused on analysis. Let's explore some common scenarios to see how costs break down based on the standard pricing of $3.00 (input) and $15.00 (output) per million tokens.

Scenario	Input	Output	What it represents	Estimated cost
Customer Support Chatbot	1,500 tokens	3,000 tokens	A typical support interaction involving user history and a detailed, multi-part answer.	~$0.05
Code Generation & Explanation	5,000 tokens (existing file + prompt)	8,000 tokens (new code + comments)	Refactoring a Python script and providing a detailed explanation of the changes.	~$0.135
Summarize a Long Article	15,000 tokens (article text)	750 tokens (bulleted summary)	A common RAG task where a large input produces a concise output.	~$0.056
Vision: Analyze a Dashboard	2,500 tokens (image + detailed prompt)	1,000 tokens (insights and data points)	Interpreting a complex business intelligence dashboard screenshot.	~$0.022
Drafting a Blog Post	300 tokens (outline and prompt)	2,500 tokens (first draft)	A content creation task where output tokens heavily outweigh input.	~$0.038
Multi-Turn Agentic Task	10,000 tokens (total input over 5 steps)	15,000 tokens (total output over 5 steps)	A workflow that researches a topic, drafts an email, and revises it based on feedback.	~$0.255

The cost estimates highlight a clear pattern: the financial impact of output tokens is paramount. Workflows that generate extensive text, code, or conversational replies are significantly more expensive than analytical tasks that condense large inputs into short summaries. Optimizing for output conciseness is the single most effective strategy for managing costs with this model.

How to control cost (a practical playbook)

Effectively managing Claude 3.5 Sonnet costs means focusing on its asymmetrical pricing. The goal is to minimize expensive output tokens while using input tokens efficiently. Implementing a few key strategies can lead to substantial savings, especially at scale.

Enforce Output Conciseness

The most direct way to control costs is to control the number of output tokens. Be explicit in your prompts about the desired length and format of the response.

Use formatting instructions: Ask for responses in bullet points, a JSON object, or a specific word count. For example: "Summarize the key findings in three bullet points."
Request brevity: Add phrases like "Be concise," "Provide a brief answer," or "Do not explain your reasoning unless asked."
Chain prompts for complex output: Instead of asking for a 5,000-word article in one go, generate an outline first (low output), then generate each section individually. This gives you more control and prevents costly over-generation.

Optimize Context Window Usage

While the 200k context window is powerful, it's also a potential cost trap. Sending large amounts of text as input for every call is inefficient. Use context strategically.

Implement RAG: For question-answering over documents, use a Retrieval-Augmented Generation (RAG) system. First, use a cheaper embedding model to find the most relevant text chunks, then pass only those chunks to Sonnet as context.
Summarize conversation history: In chatbot applications, instead of sending the entire chat history, create a running summary of the conversation and pass that along with the last few user messages.
Prune irrelevant data: Before sending large data files or codebases, programmatically remove comments, irrelevant sections, or boilerplate code to reduce the input token count.

Leverage Caching Aggressively

Many applications receive identical or similar requests repeatedly. Calling the API for each one is wasteful. A caching layer can eliminate redundant API calls entirely.

Cache identical prompts: For common questions in a customer support scenario (e.g., "What are your business hours?"), store the initial response and serve it from your cache for subsequent identical requests.
Cache by intent: Use a simpler, cheaper model to classify the user's intent. If the intent has been seen before and the entities are the same, you may be able to serve a cached response.
Set appropriate TTLs: Ensure your cache has a reasonable Time-To-Live (TTL) so that information that can become outdated (like product availability) is refreshed periodically.

Choose the Right Model for the Job

Claude 3.5 Sonnet is powerful, but not every task requires its level of sophistication. A multi-model strategy is often the most cost-effective approach.

Use a router or classifier: Implement a preliminary step that analyzes the user's prompt. For simple tasks like sentiment analysis, basic classification, or simple data extraction, route the request to a much cheaper model like Claude 3 Haiku or an open-source alternative.
Reserve Sonnet for complex tasks: Use 3.5 Sonnet for the tasks it excels at: complex reasoning, nuanced content creation, difficult code generation, and advanced visual analysis. This ensures you're not overpaying for simple jobs.

FAQ

How does 3.5 Sonnet compare to Claude 3 Sonnet and Opus?

Claude 3.5 Sonnet is a major upgrade over Claude 3 Sonnet in both speed and intelligence. It's positioned as a new, more powerful mid-tier model. Compared to Claude 3 Opus, it is twice as fast and significantly cheaper, while matching or even exceeding it on several key benchmarks, including coding and vision. It effectively replaces Claude 3 Sonnet and serves as a faster, more cost-effective alternative to Opus for many use cases.

What is the 'Artifacts' feature?

Artifacts is a new user interface feature that provides a dedicated workspace next to the conversational chat. When you ask the model to generate content like code, a document, or a web design, it appears in this Artifacts window. You can then edit the content directly, and the model will intelligently update it based on your changes. It creates a dynamic, collaborative environment, moving beyond static text generation to a more interactive workflow.

What are the best use cases for Claude 3.5 Sonnet?

Its combination of speed, intelligence, and cost-effectiveness makes it ideal for a wide range of tasks, including:

Agentic Systems: Its speed is critical for building responsive, multi-step AI agents that can execute complex workflows.
Code Generation and Development: Strong coding abilities combined with the Artifacts feature make it a powerful tool for writing, debugging, and refactoring code.
Data Analysis and Visualization: Excellent vision capabilities allow it to interpret charts and graphs, extract data, and provide insights.
Content Creation: It can quickly draft high-quality articles, reports, and other documents, with the Artifacts feature allowing for easy editing and iteration.
Real-time Customer Support: High speed and strong comprehension make it suitable for powering sophisticated, responsive customer service chatbots.

Why is the output price 5x higher than the input price?

This pricing strategy is common among frontier models and reflects the differing computational costs of processing input versus generating output. Generating novel, coherent text (output) is a more computationally intensive process for the model than reading and understanding existing text (input). The 5:1 ratio incentivizes developers to be efficient with their prompts and to design workflows that favor analysis (large input, small output) over verbose generation (small input, large output).

How does its vision capability compare to other models?

Anthropic's internal benchmarks show that Claude 3.5 Sonnet's vision capabilities are best-in-class, surpassing even Claude 3 Opus. It excels at tasks that require complex visual reasoning, such as interpreting detailed charts and graphs. It is also more accurate at transcribing text from images that are distorted or have visual artifacts, a common challenge for many vision models.

Are there any known limitations?

Like all current AI models, Claude 3.5 Sonnet is not infallible. It can still 'hallucinate' or generate incorrect information. Its knowledge is limited to information available up to March 2024, so it will not be aware of more recent events. While its intelligence is high, for the most complex, abstract reasoning tasks, a top-tier model like GPT-4o or the forthcoming Claude 3.5 Opus may still have an edge. Finally, its high output cost requires careful application design to remain economical.

Claude 3.5 Sonnet (June) (non-reasoning)