An open-source 20B parameter model from OpenAI that delivers exceptional speed and top-tier intelligence at a competitive price point, making it a strong all-rounder.
The gpt-oss-20B (low) model emerges as a formidable contender in the open-source landscape, offering a compelling blend of performance, intelligence, and cost-effectiveness. With approximately 20 billion parameters, it occupies a sweet spot, providing sophisticated reasoning and generation capabilities without the overhead of much larger models. Its performance metrics reveal a model that is not just a jack-of-all-trades but a master of several, particularly in speed and raw intelligence, where it ranks among the top performers in its class.
Scoring an impressive 44 on the Artificial Analysis Intelligence Index, gpt-oss-20B (low) significantly outperforms the average score of 26 for comparable models, placing it in the top echelon (#8 out of 84). This indicates a strong aptitude for complex tasks like reasoning, instruction following, and creative generation. Interestingly, it achieves this high score with relative conciseness, generating 15 million tokens during the evaluation compared to the class average of 23 million. This suggests an efficient model that can deliver high-quality responses without unnecessary verbosity—a crucial factor for managing output costs and improving user experience.
Perhaps its most eye-catching feature is its speed. When served by optimized inference providers like Together.ai and Groq, it achieves output speeds exceeding 900 tokens per second, a rate that rivals or even surpasses many smaller, specialized models. This makes it exceptionally well-suited for real-time applications such as interactive chatbots, live coding assistants, and rapid content creation. This speed, combined with a massive 131,000-token context window, unlocks new possibilities for processing and analyzing long documents, maintaining coherent, extended conversations, and performing complex, context-aware tasks that were previously impractical.
From a financial perspective, gpt-oss-20B (low) is positioned as a high-value option. While not the absolute cheapest on the market, its pricing is moderate and highly competitive given its performance profile. With input costs around $0.07 per million tokens and output at $0.20, it provides access to top-tier capabilities at a fraction of the cost of leading proprietary models. This balance of power, speed, and price makes gpt-oss-20B (low) a strategic choice for developers and businesses looking to build advanced AI features without committing to the high costs and closed ecosystems of flagship commercial offerings.
44 (#8 / 84)
245.9 tokens/s
$0.07 / 1M tokens
$0.20 / 1M tokens
15M tokens
0.15 seconds
| Spec | Details |
|---|---|
| Model Name | gpt-oss-20B (low) |
| Owner | OpenAI |
| License | Open |
| Parameters | ~20 Billion |
| Context Window | 131,000 tokens |
| Input Modalities | Text |
| Output Modalities | Text |
| Knowledge Cutoff | May 2025 |
| Intelligence Index Score | 44 |
| Intelligence Index Rank | #8 / 84 |
| Default Output Speed | 245.9 tokens/s |
| Default Input Price | $0.07 / 1M tokens |
| Default Output Price | $0.20 / 1M tokens |
Choosing the right API provider for gpt-oss-20B (low) is critical, as performance and cost can vary dramatically. Your ideal choice depends entirely on whether your primary goal is minimizing cost, maximizing throughput, achieving the lowest possible latency, or finding a balanced, all-around option. We've benchmarked the leading providers to help you make an informed decision.
| Priority | Pick | Why | Tradeoff to accept |
|---|---|---|---|
| Lowest Blended Price | Novita | At a blended price of just $0.07 per million tokens, Novita is the undisputed cost leader for running this model at scale. | Its output speed (246 t/s) and latency are solid but fall short of the top-tier speed specialists. |
| Maximum Speed | Together.ai | Delivers the highest output throughput at a blistering 975 tokens/second, making it the top choice for bulk processing and high-volume generation. | Slightly more expensive than the absolute cheapest options, and its latency isn't the lowest available. |
| Lowest Latency | Groq | Tied for the lowest time-to-first-token (TTFT) at an incredible 0.15 seconds. This is the pick for the most responsive, real-time user experiences. | While extremely fast in output (933 t/s), its pricing is not as competitive as budget-focused providers. |
| Balanced Performance | Lightning AI | Offers a fantastic middle ground: very low price ($0.09/M), strong speed (312 t/s), and low latency (0.41s). It's a great default choice for many use cases. | It is not the absolute number one in any single category, but excels as a versatile all-rounder. |
| Enterprise Choice | Google Vertex AI | Provides the reliability, security, and support of a major cloud platform. It matches Groq for the lowest latency (0.15s), making it a premium, high-performance option. | It is one of the more expensive providers, reflecting the cost of the enterprise-grade ecosystem and support. |
Note: Performance and pricing data are subject to change. Benchmarks reflect point-in-time analysis. The 'Blended Price' is a weighted average and may not reflect your exact costs.
To understand the real-world cost implications of using gpt-oss-20B (low), let's model a few common scenarios. These estimates are based on the average pricing of $0.07 per 1M input tokens and $0.20 per 1M output tokens. Note how the cost balance shifts depending on whether the task is input-heavy or output-heavy.
| Scenario | Input | Output | What it represents | Estimated cost |
|---|---|---|---|---|
| RAG Chatbot Query | 15,000 tokens | 500 tokens | A single user query where extensive documentation is injected as context. | ~$0.00115 |
| Long Document Summary | 100,000 tokens | 2,000 tokens | Summarizing a large PDF report or legal document into key takeaways. | ~$0.00740 |
| Code Generation Task | 1,000 tokens | 3,000 tokens | Generating a Python script or a complex SQL query from a detailed prompt. | ~$0.00067 |
| Content Creation | 500 tokens | 8,000 tokens | Writing a draft for a blog post or marketing email based on a short outline. | ~$0.00164 |
| Data Extraction (JSON) | 20,000 tokens | 1,000 tokens | Parsing an unstructured text document to extract structured data. | ~$0.00160 |
The key takeaway is the significant impact of the input-to-output price ratio. Tasks that generate a lot of text, like content creation and code generation, see their costs driven primarily by the $0.20/M output price. Conversely, input-heavy tasks like RAG are more sensitive to the $0.07/M input price. Optimizing for either input length or output verbosity is the most direct path to cost management.
Effectively managing the cost of gpt-oss-20B (low) involves a multi-faceted strategy. While the model offers great value, its powerful features like the large context window and high speed can lead to unexpected expenses if not handled carefully. Here are several key strategies to keep your operational costs in check.
Your choice of API provider is the single biggest lever on your cost and performance. Don't default to one provider for all tasks.
The 131k context window is a powerful tool but also a significant cost driver. Sending 100k tokens on every call is rarely necessary and always expensive.
With output tokens costing nearly three times as much as input tokens, controlling the model's verbosity is crucial for managing expenses, especially in generative tasks.
Reduce redundant API calls and improve throughput with smart architectural choices.
gpt-oss-20B (low) is an open-source large language model from OpenAI with approximately 20 billion parameters. It is designed to provide a strong balance of high intelligence, extremely fast inference speed, and moderate cost. It features a large 131,000-token context window and is proficient in a wide range of text-based tasks, from generation and summarization to complex reasoning.
While not officially defined by the owner, the "(low)" designation typically suggests that this is a variant of a base model optimized for performance and efficiency. This is often achieved through techniques like:
In this case, the "(low)" likely refers to a lower-precision or quantized version that enables the remarkable speed and cost-efficiency observed in benchmarks.
gpt-oss-20B (low) positions itself very competitively. Compared to other models in the 15-30B parameter range, it stands out for its combination of top-tier intelligence (ranking #8/84) and elite speed on optimized hardware. While some models might be slightly cheaper, they often don't match its intelligence score or throughput. Conversely, models that are more intelligent are typically much larger, slower, and more expensive to run. Its large context window is also a significant advantage over many other models in its class.
Given its profile, gpt-oss-20B (low) excels in a variety of applications:
Based on our benchmarks, Together.ai offers the highest output speed (throughput) at 975 tokens per second. For the lowest latency (time to first token), Groq and Google Vertex are tied for the lead at just 0.15 seconds. Your choice depends on whether you need to generate a lot of text quickly (throughput) or get the first word back as fast as possible (latency).
The most cost-effective provider is Novita, with a blended price of $0.07 per million tokens. They also offer the cheapest input token price at $0.04/M. This makes them an excellent choice for cost-sensitive applications, especially background tasks where maximum speed is not a requirement.