OpenAI's latest medium-sized model delivers top-tier intelligence and speed, making it a powerful choice for complex, multimodal tasks.
GPT-5 (medium) represents a significant leap forward in the GPT series from OpenAI, positioning itself as a premium, high-performance model for demanding applications. It combines top-tier reasoning capabilities with impressive speed and a vast 400,000-token context window. This combination makes it a formidable tool for tasks that require understanding long, complex documents, analyzing images, and generating nuanced, high-quality text.
On the Artificial Analysis Intelligence Index, GPT-5 (medium) achieves a score of 66, placing it firmly in the elite tier of models. This is substantially higher than the class average of 44, demonstrating its advanced cognitive abilities. This intelligence comes with a tendency towards verbosity; during our evaluation, it generated 45 million tokens, well above the average of 28 million. While this can lead to more detailed and comprehensive outputs, it's a critical factor to manage for cost control, given the model's pricing structure.
In terms of performance, GPT-5 (medium) is notably quick. The official OpenAI API clocks in at 111.3 tokens per second, which is significantly faster than the average of 71 t/s for comparable models. For developers prioritizing raw speed, the Microsoft Azure endpoint offers a blistering 234 t/s, more than double the speed of the direct OpenAI offering. This performance advantage on Azure, coupled with its lower latency, makes it the clear choice for real-time applications.
Pricing is competitive but requires careful consideration. The input cost of $1.25 per million tokens is quite reasonable, falling below the market average of $1.60. However, the output cost of $10.00 per million tokens, while matching the market average, can quickly accumulate, especially given the model's verbose nature. The total cost to run our full intelligence benchmark on GPT-5 (medium) was $512.51, a testament to its capabilities but also a warning of its potential expense at scale. Developers must balance the need for its powerful outputs against the relatively high cost of generating them.
66 (9 / 101)
111.3 tokens/s
$1.25 / 1M tokens
$10.00 / 1M tokens
45M tokens
24.04 seconds
| Spec | Details |
|---|---|
| Owner | OpenAI |
| License | Proprietary |
| Model Family | GPT-5 |
| Context Window | 400,000 tokens |
| Knowledge Cutoff | September 2024 |
| Input Modalities | Text, Image |
| Output Modalities | Text |
| API Providers | OpenAI, Microsoft Azure |
| Intelligence Score | 66 (Artificial Analysis Index) |
| Base Speed (OpenAI) | 111.3 tokens/sec |
| Max Speed (Azure) | 234 tokens/sec |
GPT-5 (medium) is available through its creator, OpenAI, and via Microsoft Azure. While the underlying model and pricing are identical, the infrastructure differences create a clear performance gap. Your choice of provider should be dictated by your application's sensitivity to speed and latency.
| Priority | Pick | Why | Tradeoff to accept |
|---|---|---|---|
| Highest Performance | Microsoft Azure | Azure offers more than double the output speed (234 vs 111 t/s) and half the latency (24 vs 51s TTFT). It is the unequivocal winner for any performance-critical workload. | May involve more complex setup and integration with the broader Azure ecosystem compared to the straightforward OpenAI API. |
| Lowest Cost | Tie | Both Azure and OpenAI offer the exact same pricing: $1.25 per 1M input tokens and $10.00 per 1M output tokens. | Cost will be determined by your usage pattern, not the provider. However, Azure's speed could enable higher throughput, potentially lowering costs per task if you can process more in less time. |
| Simplicity / Ease of Use | OpenAI | The direct OpenAI API is famously easy to get started with, requiring minimal setup and offering a simple, model-focused integration path. | You sacrifice significant performance. The latency and lower throughput are not suitable for real-time user-facing applications. |
| Enterprise Integration | Microsoft Azure | Azure provides robust enterprise features, including private networking, stricter security compliance, and integration with other Azure services. | Overkill for smaller projects or teams without existing investment in the Microsoft cloud ecosystem. |
Provider performance metrics are based on benchmarks conducted by Artificial Analysis and represent a snapshot in time. Performance can vary based on region, time of day, and specific API configurations.
The cost of using GPT-5 (medium) is heavily influenced by the ratio of input to output tokens. Its cheap input and expensive output make it ideal for 'needle in a haystack' tasks but costly for generative ones. Here are some estimated costs for common scenarios, using the standard $1.25 input / $10.00 output pricing.
| Scenario | Input | Output | What it represents | Estimated cost |
|---|---|---|---|---|
| Customer Support Bot Response | 2,500 tokens | 300 tokens | Answering a user query with conversation history as context. | ~$0.0061 |
| Summarize a Long Report | 100,000 tokens | 1,000 tokens | Ingesting a 75-page document and generating a one-page summary. | ~$0.135 |
| Complex RAG Query | 350,000 tokens | 500 tokens | Answering a specific question using a large knowledge base provided in-context. | ~$0.4425 |
| Code Generation & Explanation | 1,500 tokens | 2,000 tokens | Generating a complex function and explaining how it works. | ~$0.0219 |
| Multimodal Analysis | 5,000 text tokens + 1 image | 400 tokens | Describing a complex chart and providing a summary of its findings. | ~$0.0103 |
The takeaway is clear: tasks that require extensive context but produce concise answers (like RAG and summarization) are cost-effective. Conversely, tasks that generate large amounts of text, like writing long articles or extensive code, can become expensive quickly due to the 8x price multiplier on output tokens.
Managing the operational cost of a powerful model like GPT-5 (medium) is crucial for building a sustainable application. Its pricing model, with cheap inputs and expensive outputs, requires specific strategies. Focus on controlling the number of generated tokens and being intentional with context.
Given the model's tendency to be verbose and the high cost of output tokens, controlling generation length is the most important cost-saving measure.
Your goal is to minimize expensive output tokens relative to cheap input tokens. Design your application logic around this principle.
The massive context window is a powerful feature, but using it inefficiently can be costly and slow, even with cheap input tokens.
While pricing is identical, the performance difference between OpenAI and Azure is significant.
GPT-5 (medium) is a large language model from OpenAI. It is 'multimodal,' meaning it can process both text and images as input. It is distinguished by its high intelligence score, a very large 400,000-token context window, and strong performance, particularly when served via Microsoft Azure.
GPT-5 (medium) is a generational leap over GPT-4 Turbo. It demonstrates significantly higher intelligence and reasoning capabilities, as shown by its score of 66 on the Intelligence Index. It also features a much larger context window (400k vs. 128k) and is substantially faster, especially on optimized infrastructure like Azure.
The 'medium' designation likely refers to its position within the broader GPT-5 family of models. It suggests that OpenAI may also offer other versions, such as a smaller, faster 'light' model or a larger, even more powerful 'large' or 'ultra' model. 'Medium' implies a balance between cutting-edge capability and manageable operational cost and speed.
The performance difference stems from the underlying infrastructure. Microsoft has heavily invested in optimizing its Azure data centers specifically for serving large OpenAI models at scale. This includes specialized hardware, networking, and software configurations that allow them to achieve higher throughput and lower latency than the general-purpose OpenAI API infrastructure.
Not always. While incredibly powerful for analyzing large documents or maintaining long conversations, it comes with trade-offs. Filling the context window increases cost (a full 400k context costs $0.50 in input tokens) and can increase the model's response time. For many tasks that don't require that much context, using a smaller amount is more efficient and cost-effective.
GPT-5 (medium) excels at tasks that require a combination of deep reasoning, large context, and speed. Top use cases include: advanced Retrieval-Augmented Generation (RAG) over large document sets, complex legal or financial document analysis, multimodal applications that analyze charts or diagrams, and sophisticated, low-latency customer-facing chatbots that require extensive conversation history.