EXAONE 4.0 32B (Reasoning)

High intelligence meets high speed, but at a premium price.

EXAONE 4.0 32B (Reasoning)

An open-weight model from LG AI Research, offering top-tier reasoning capabilities and impressive speed, albeit with a high cost and notable verbosity.

Open Weight131k ContextHigh IntelligenceFastExpensiveVerbose

EXAONE 4.0 32B (Reasoning) is a formidable entry into the open-weight model space from LG AI Research. As its name suggests, this 32-billion parameter model is specifically tuned for complex reasoning, logical deduction, and multi-step problem-solving. It distinguishes itself with a potent combination of high intelligence, ranking in the top tier of our benchmarks, and impressive generation speed, making it a compelling option for demanding, real-time applications.

Scoring an impressive 43 on the Artificial Analysis Intelligence Index, EXAONE 4.0 32B places 10th out of 84 models, significantly outperforming the class average of 26. This intellectual prowess is complemented by a generation speed of over 109 tokens per second, which is faster than the average model in its class. This pairing of smarts and speed is rare and positions the model as a premium tool for developers who need both high-quality output and a responsive user experience.

However, this premium performance comes with a premium price tag. At $0.60 per million input tokens and $1.00 per million output tokens, it is substantially more expensive than its open-weight peers. The model also exhibits a strong tendency towards verbosity, generating over four times the average number of tokens during our intelligence evaluation. This combination of high per-token cost and high token output means that operational expenses can accumulate quickly. Developers must weigh the model's exceptional capabilities—including its massive 131k context window—against a total cost of ownership that rivals some proprietary, closed-source models.

Ultimately, EXAONE 4.0 32B is a specialist model. It's not a cost-effective choice for simple, high-volume tasks. Instead, it excels in scenarios where its deep reasoning, large context handling, and rapid response times are critical requirements that justify the higher operational cost. Use cases like in-depth legal document analysis, complex scientific research, or sophisticated multi-turn conversational AI are where this model is designed to shine.

Scoreboard

Intelligence

43 (10 / 84)

Scores 43 on the Artificial Analysis Intelligence Index, placing it in the top tier for reasoning and comprehension among 84 benchmarked models.
Output speed

109.3 tokens/s

Faster than the class average of 93 tokens/s, making it well-suited for interactive and real-time applications.
Input price

$0.60 / 1M tokens

Significantly more expensive than the average input price of $0.12 for comparable models.
Output price

$1.00 / 1M tokens

Four times the average output price of $0.25, making it a premium-priced option for generation.
Verbosity signal

100M tokens

Generated 100M tokens during intelligence testing, far exceeding the average of 23M, indicating a highly verbose nature.
Provider latency

0.33 seconds

A low time-to-first-token (TTFT) ensures a responsive user experience, with text appearing almost instantly.

Technical specifications

Spec Details
Owner LG AI Research
License Open
Model Size 32 Billion Parameters
Context Window 131,072 tokens
Input Modality Text
Output Modality Text
Variant Reasoning (Fine-tuned)
Benchmarked Provider FriendliAI
Latency (TTFT) 0.33 seconds
Output Speed 109.3 tokens/second
Input Token Price $0.60 / 1M tokens
Output Token Price $1.00 / 1M tokens
Blended Price (3:1) $0.70 / 1M tokens

What stands out beyond the scoreboard

Where this model wins
  • Top-Tier Intelligence: With an intelligence score of 43, it ranks among the smartest models available, making it highly capable for complex reasoning, analysis, and problem-solving tasks.
  • High-Speed Generation: At over 109 tokens per second, it delivers answers quickly, enabling fluid conversational AI and other real-time applications without sacrificing quality.
  • Massive Context Window: The 131k token context window allows it to process and analyze extremely large documents, such as entire codebases, lengthy legal contracts, or extensive research papers in a single pass.
  • Open and Accessible: As an open-weight model, it offers greater transparency and flexibility compared to closed-source alternatives, allowing for more customization and a better understanding of its architecture.
Where costs sneak up
  • High Base Price: Both input ($0.60/1M) and output ($1.00/1M) token prices are at the top end of the market for open-weight models, establishing a high cost floor for any application.
  • Extreme Verbosity: The model's tendency to produce lengthy, detailed responses can dramatically increase costs, as the expensive output tokens accumulate rapidly even with simple prompts.
  • The Large Context Trap: While the 131k context window is a powerful feature, fully utilizing it with large inputs leads to significant per-query costs due to the high input token price.
  • Cost of Complexity: Tasks requiring deep reasoning often involve longer, more detailed prompts and generate more comprehensive outputs, a combination that makes costs escalate quickly with this model.
  • No Cheaper Tiers: Unlike some model families, there isn't a smaller, less expensive version of EXAONE 4.0 to handle simpler tasks, forcing developers to pay premium prices for all queries.

Provider pick

Currently, EXAONE 4.0 32B is available through a limited number of API providers. Our benchmarks focus on FriendliAI, which offers a performant and reliable endpoint for accessing the model's capabilities. As the ecosystem matures, we expect to see more providers offering this model.

Priority Pick Why Tradeoff to accept
Best Overall FriendliAI As the sole benchmarked provider, FriendliAI is the default choice. It delivers the model's full potential with excellent speed (109.3 tokens/s) and low latency (0.33s TTFT). The primary tradeoff is the model's inherent high cost, which is a function of the model itself, not the provider.
Fastest FriendliAI With an output speed well above the class average, FriendliAI's serving infrastructure proves highly effective for this model, making it the fastest option available. Speed comes at the model's set price; there is no slower, cheaper alternative for this specific model.
Cheapest FriendliAI By default, FriendliAI is also the most cost-effective option. The pricing of $0.60 (input) and $1.00 (output) is the current market rate for this model. 'Cheapest' is relative; the model remains one of the most expensive open-weight options on the market.

Provider analysis is based on public pricing and performance benchmarks conducted by Artificial Analysis. Performance can vary based on workload, concurrency, and region. Prices are subject to change. This is not a sponsored placement.

Real workloads cost table

The true cost of an AI model emerges in real-world application. The following scenarios illustrate the estimated cost of using EXAONE 4.0 32B for various tasks, based on its pricing on FriendliAI. Note how the interplay between input size, output verbosity, and per-token price affects the final cost.

Scenario Input Output What it represents Estimated cost
Legal Contract Review 50,000 tokens 2,000 tokens Summarizing a long document, leveraging the large context window. ~$0.032
Complex Code Scaffolding 1,000 tokens 4,000 tokens Generating a functional application skeleton from a detailed prompt. ~$0.0046
Multi-Turn RAG Session 12,500 tokens (total) 7,500 tokens (total) A 5-turn chat using retrieved documents for context in each turn. ~$0.015
Creative Story Generation 200 tokens 5,000 tokens A simple prompt yielding a long, verbose creative output. ~$0.0051
Email Classification (Batch) 250,000 tokens (1000 emails) 5,000 tokens (1000 labels) A simple task where the model's high cost and power are overkill. ~$0.155

These examples highlight that EXAONE 4.0 32B is most cost-effective when its reasoning power is essential and the output is controlled. For simple, high-volume tasks like classification, its cost is prohibitive compared to smaller, cheaper models. The key is to reserve its use for high-value problems that justify the expense.

How to control cost (a practical playbook)

Given its high price and verbosity, managing the cost of EXAONE 4.0 32B is crucial for building a sustainable application. Proactive strategies can help you leverage its power without incurring runaway expenses. Below are several tactics to consider.

Tame the Model's Verbosity

This model's default behavior is to be verbose, which directly increases costs due to the high output token price. You can mitigate this through careful prompt engineering.

  • Set Explicit Constraints: Include phrases like "Be concise," "Respond in three sentences," or "Use bullet points" in your prompts.
  • Use the max_tokens Parameter: Set a hard limit on the length of the generated output to prevent unexpectedly long and expensive responses.
  • Request Structured Output: Ask for the output in a specific format like JSON. This often forces the model to be more direct and less conversational, reducing token count.
Be Strategic with the Context Window

The 131k context window is a powerful but expensive feature. Filling it unnecessarily will lead to high costs on every call.

  • Don't Default to Full Context: Only provide the information that is absolutely necessary for the task at hand. Avoid passing entire documents or chat histories if a summary will suffice.
  • Use a Two-Step Summary Process: For very large documents, consider using a cheaper model to first summarize the text into a more compact form before passing it to EXAONE 4.0 for deep analysis.
  • Leverage RAG: For question-answering over large knowledge bases, use a vector database to retrieve only the most relevant chunks of text to include in the prompt, rather than stuffing the entire document into the context.
Implement Smart Caching

Many applications receive repetitive user queries. Re-calculating the same answer is a waste of money and compute.

  • Cache Identical Prompts: Store the results of common or identical prompts in a database like Redis or a simple key-value store. Before calling the API, check if the answer already exists in your cache.
  • Cache Embeddings: In RAG applications, cache the embeddings of your documents so you don't have to re-calculate them frequently.
Use the Right Model for the Job

EXAONE 4.0 32B is a specialist tool. Using it for simple tasks is like using a sledgehammer to crack a nut—inefficient and expensive.

  • Build a Model Router: Create a system that analyzes the user's prompt and routes it to the most appropriate model. A simple classification task can go to a cheap, fast model, while a complex reasoning query is routed to EXAONE.
  • Establish Cost Tiers: Reserve EXAONE for a 'premium' or 'pro' tier of your application, where users might pay more for higher-quality answers, while using a cheaper model for the free or standard tier.

FAQ

What is EXAONE 4.0 32B?

EXAONE 4.0 32B is a 32-billion parameter large language model developed by LG AI Research. This specific version, designated "Reasoning," has been fine-tuned to excel at tasks requiring logical deduction, complex instruction following, and multi-step problem-solving. It is part of LG's broader EXAONE family of multimodal foundation models.

Who is LG AI Research?

LG AI Research is the central artificial intelligence research hub for the South Korean conglomerate LG Group. Their mission is to advance AI technology and apply it across LG's various industries, from electronics to chemicals. The development of the EXAONE model series is one of their flagship initiatives.

What does the "(Reasoning)" tag signify?

The "(Reasoning)" tag indicates that this is a specialized variant of the base EXAONE 4.0 model. It has undergone additional training (fine-tuning) on datasets specifically designed to enhance its abilities in logic, mathematics, code generation, and following complex, multi-part instructions. This makes it more powerful for analytical tasks than a general-purpose base model.

How does its performance compare to other models?

EXAONE 4.0 32B (Reasoning) is a top performer. Its intelligence score of 43 places it well above the average and in the same league as many leading proprietary models. It is faster than the average model in its class. However, it is also significantly more expensive and more verbose than most other open-weight models of a similar size.

What are the best use cases for this model?

This model is best suited for high-value tasks where its specific strengths can justify its cost. Ideal use cases include:

  • Legal and Financial Document Analysis: Using the large context window to analyze dense contracts or reports.
  • Scientific Research: Assisting researchers by summarizing papers, generating hypotheses, and analyzing data from text.
  • Complex Code Generation: Scaffolding entire applications or debugging complex algorithms based on detailed specifications.
  • Advanced Conversational Agents: Building chatbots that can maintain context over long conversations and perform complex tasks for the user.
What does its "Open" license mean for developers?

An "Open" license generally means the model weights are publicly available, allowing for greater flexibility than closed, API-only models. Developers can potentially self-host the model for privacy and control, or fine-tune it on their own proprietary data to create a more specialized version. However, developers should always consult the specific license agreement for EXAONE 4.0 to understand the precise terms, conditions, and any restrictions on commercial use.


Subscribe