GPT-5 (medium)

A highly intelligent model with impressive speed and a massive context window.

GPT-5 (medium)

OpenAI's latest medium-sized model delivers top-tier intelligence and speed, making it a powerful choice for complex, multimodal tasks.

Multimodal400k ContextHigh IntelligenceFastProprietary

GPT-5 (medium) represents a significant leap forward in the GPT series from OpenAI, positioning itself as a premium, high-performance model for demanding applications. It combines top-tier reasoning capabilities with impressive speed and a vast 400,000-token context window. This combination makes it a formidable tool for tasks that require understanding long, complex documents, analyzing images, and generating nuanced, high-quality text.

On the Artificial Analysis Intelligence Index, GPT-5 (medium) achieves a score of 66, placing it firmly in the elite tier of models. This is substantially higher than the class average of 44, demonstrating its advanced cognitive abilities. This intelligence comes with a tendency towards verbosity; during our evaluation, it generated 45 million tokens, well above the average of 28 million. While this can lead to more detailed and comprehensive outputs, it's a critical factor to manage for cost control, given the model's pricing structure.

In terms of performance, GPT-5 (medium) is notably quick. The official OpenAI API clocks in at 111.3 tokens per second, which is significantly faster than the average of 71 t/s for comparable models. For developers prioritizing raw speed, the Microsoft Azure endpoint offers a blistering 234 t/s, more than double the speed of the direct OpenAI offering. This performance advantage on Azure, coupled with its lower latency, makes it the clear choice for real-time applications.

Pricing is competitive but requires careful consideration. The input cost of $1.25 per million tokens is quite reasonable, falling below the market average of $1.60. However, the output cost of $10.00 per million tokens, while matching the market average, can quickly accumulate, especially given the model's verbose nature. The total cost to run our full intelligence benchmark on GPT-5 (medium) was $512.51, a testament to its capabilities but also a warning of its potential expense at scale. Developers must balance the need for its powerful outputs against the relatively high cost of generating them.

Scoreboard

Intelligence

66 (9 / 101)

Scores significantly above the class average of 44, placing it in the top 10% of models benchmarked for intelligence.
Output speed

111.3 tokens/s

Faster than the class average of 71 tokens/s. The Azure endpoint is even faster at 234 t/s.
Input price

$1.25 / 1M tokens

More affordable than the class average of $1.60, making it cost-effective for context-heavy tasks.
Output price

$10.00 / 1M tokens

Priced exactly at the class average. This is the primary cost driver for the model.
Verbosity signal

45M tokens

Significantly more verbose than the class average of 28M tokens during intelligence testing.
Provider latency

24.04 seconds

Based on Azure's excellent time-to-first-token. The OpenAI API is slower to start at 50.64s.

Technical specifications

Spec Details
Owner OpenAI
License Proprietary
Model Family GPT-5
Context Window 400,000 tokens
Knowledge Cutoff September 2024
Input Modalities Text, Image
Output Modalities Text
API Providers OpenAI, Microsoft Azure
Intelligence Score 66 (Artificial Analysis Index)
Base Speed (OpenAI) 111.3 tokens/sec
Max Speed (Azure) 234 tokens/sec

What stands out beyond the scoreboard

Where this model wins
  • Elite Intelligence: With a score of 66, it's one of the smartest models available, capable of handling complex reasoning, instruction following, and creative tasks.
  • Massive Context Window: The 400k token context window allows for deep analysis of very large documents, extensive conversation history, or complex Retrieval-Augmented Generation (RAG) without chunking.
  • Blazing Speed on Azure: The Microsoft Azure endpoint delivers exceptional throughput (234 t/s) and low latency, making it suitable for interactive and real-time applications.
  • Multimodal Capabilities: The ability to process and understand images alongside text opens up a wide range of use cases, from analyzing charts to describing visual scenes.
  • Affordable Input Pricing: At $1.25 per million input tokens, it's cheaper than average to feed the model large amounts of context, synergizing well with its large context window.
Where costs sneak up
  • High Output Token Cost: At $10.00 per million tokens, the cost of generating text is 8 times higher than the cost of processing it. This is the single biggest cost factor.
  • Inherent Verbosity: The model's tendency to be verbose (45M tokens vs. 28M average in tests) directly multiplies the high output token cost, leading to unexpectedly high bills if not managed.
  • Large Context is a Double-Edged Sword: While powerful, filling the 400k context window for a single query can be expensive. A single full-context prompt costs $0.50 in input tokens alone.
  • Proprietary Lock-in: As a closed-source, proprietary model, you are dependent on OpenAI and Microsoft for pricing, availability, and updates, with no option to self-host or fine-tune the base model.
  • No Fine-Tuning Available: Lack of fine-tuning means you cannot specialize the model for a specific task to improve efficiency or reduce verbosity, relying solely on prompt engineering.

Provider pick

GPT-5 (medium) is available through its creator, OpenAI, and via Microsoft Azure. While the underlying model and pricing are identical, the infrastructure differences create a clear performance gap. Your choice of provider should be dictated by your application's sensitivity to speed and latency.

Priority Pick Why Tradeoff to accept
Highest Performance Microsoft Azure Azure offers more than double the output speed (234 vs 111 t/s) and half the latency (24 vs 51s TTFT). It is the unequivocal winner for any performance-critical workload. May involve more complex setup and integration with the broader Azure ecosystem compared to the straightforward OpenAI API.
Lowest Cost Tie Both Azure and OpenAI offer the exact same pricing: $1.25 per 1M input tokens and $10.00 per 1M output tokens. Cost will be determined by your usage pattern, not the provider. However, Azure's speed could enable higher throughput, potentially lowering costs per task if you can process more in less time.
Simplicity / Ease of Use OpenAI The direct OpenAI API is famously easy to get started with, requiring minimal setup and offering a simple, model-focused integration path. You sacrifice significant performance. The latency and lower throughput are not suitable for real-time user-facing applications.
Enterprise Integration Microsoft Azure Azure provides robust enterprise features, including private networking, stricter security compliance, and integration with other Azure services. Overkill for smaller projects or teams without existing investment in the Microsoft cloud ecosystem.

Provider performance metrics are based on benchmarks conducted by Artificial Analysis and represent a snapshot in time. Performance can vary based on region, time of day, and specific API configurations.

Real workloads cost table

The cost of using GPT-5 (medium) is heavily influenced by the ratio of input to output tokens. Its cheap input and expensive output make it ideal for 'needle in a haystack' tasks but costly for generative ones. Here are some estimated costs for common scenarios, using the standard $1.25 input / $10.00 output pricing.

Scenario Input Output What it represents Estimated cost
Customer Support Bot Response 2,500 tokens 300 tokens Answering a user query with conversation history as context. ~$0.0061
Summarize a Long Report 100,000 tokens 1,000 tokens Ingesting a 75-page document and generating a one-page summary. ~$0.135
Complex RAG Query 350,000 tokens 500 tokens Answering a specific question using a large knowledge base provided in-context. ~$0.4425
Code Generation & Explanation 1,500 tokens 2,000 tokens Generating a complex function and explaining how it works. ~$0.0219
Multimodal Analysis 5,000 text tokens + 1 image 400 tokens Describing a complex chart and providing a summary of its findings. ~$0.0103

The takeaway is clear: tasks that require extensive context but produce concise answers (like RAG and summarization) are cost-effective. Conversely, tasks that generate large amounts of text, like writing long articles or extensive code, can become expensive quickly due to the 8x price multiplier on output tokens.

How to control cost (a practical playbook)

Managing the operational cost of a powerful model like GPT-5 (medium) is crucial for building a sustainable application. Its pricing model, with cheap inputs and expensive outputs, requires specific strategies. Focus on controlling the number of generated tokens and being intentional with context.

Control Output Verbosity

Given the model's tendency to be verbose and the high cost of output tokens, controlling generation length is the most important cost-saving measure.

  • Use `max_tokens`: Always set a reasonable `max_tokens` limit in your API calls to prevent runaway generation and cap the cost of any single request.
  • Prompt Engineering: Explicitly instruct the model to be concise. Use phrases like "Be brief," "Answer in one sentence," or "Provide a bulleted list of 3 items."
  • Structure the Output: Ask the model to return a structured format like JSON. This often results in less conversational, more compact output that is also easier to parse programmatically.
Optimize Input-to-Output Ratio

Your goal is to minimize expensive output tokens relative to cheap input tokens. Design your application logic around this principle.

  • Favor 'Extraction' over 'Generation': Frame tasks as information extraction from a large context whenever possible. For example, instead of asking "Write a summary of this document," ask "Extract the key 3 findings from this document."
  • Use Few-Shot Prompting for Formatting: If you need a specific output format, provide examples in the prompt (input tokens) rather than asking the model to generate a lengthy explanation of the format before providing the answer (output tokens).
Be Strategic with the 400k Context Window

The massive context window is a powerful feature, but using it inefficiently can be costly and slow, even with cheap input tokens.

  • Don't Send Redundant Context: For session-based applications (like chatbots), implement a summarization strategy for the conversation history instead of passing the entire transcript with every turn.
  • Batch Queries: If you need to ask multiple questions about the same large document, pass the document once and ask all questions in a single, structured API call to avoid repeatedly paying the input cost for the same context.
  • Consider a RAG Pipeline: For very large knowledge bases, a traditional RAG approach (embedding and retrieving relevant chunks) may still be more cost-effective than stuffing hundreds of thousands of tokens into the context for every query.
Choose the Right Provider for Your Needs

While pricing is identical, the performance difference between OpenAI and Azure is significant.

  • Default to Azure for Performance: If your application is user-facing or requires low latency, the speed and responsiveness of the Azure endpoint are worth the potentially more complex setup.
  • Use OpenAI for Prototyping: The direct OpenAI API is perfect for quick experiments, internal tools, and applications where a few seconds of extra latency don't matter.

FAQ

What is GPT-5 (medium)?

GPT-5 (medium) is a large language model from OpenAI. It is 'multimodal,' meaning it can process both text and images as input. It is distinguished by its high intelligence score, a very large 400,000-token context window, and strong performance, particularly when served via Microsoft Azure.

How does it compare to older models like GPT-4 Turbo?

GPT-5 (medium) is a generational leap over GPT-4 Turbo. It demonstrates significantly higher intelligence and reasoning capabilities, as shown by its score of 66 on the Intelligence Index. It also features a much larger context window (400k vs. 128k) and is substantially faster, especially on optimized infrastructure like Azure.

What does the 'medium' in the name signify?

The 'medium' designation likely refers to its position within the broader GPT-5 family of models. It suggests that OpenAI may also offer other versions, such as a smaller, faster 'light' model or a larger, even more powerful 'large' or 'ultra' model. 'Medium' implies a balance between cutting-edge capability and manageable operational cost and speed.

Why is Azure so much faster than OpenAI for the same model?

The performance difference stems from the underlying infrastructure. Microsoft has heavily invested in optimizing its Azure data centers specifically for serving large OpenAI models at scale. This includes specialized hardware, networking, and software configurations that allow them to achieve higher throughput and lower latency than the general-purpose OpenAI API infrastructure.

Is the 400k context window always useful?

Not always. While incredibly powerful for analyzing large documents or maintaining long conversations, it comes with trade-offs. Filling the context window increases cost (a full 400k context costs $0.50 in input tokens) and can increase the model's response time. For many tasks that don't require that much context, using a smaller amount is more efficient and cost-effective.

What are the best use cases for GPT-5 (medium)?

GPT-5 (medium) excels at tasks that require a combination of deep reasoning, large context, and speed. Top use cases include: advanced Retrieval-Augmented Generation (RAG) over large document sets, complex legal or financial document analysis, multimodal applications that analyze charts or diagrams, and sophisticated, low-latency customer-facing chatbots that require extensive conversation history.


Subscribe