AI Hardware Benchmarking & Performance Analysis
Price per GPU Hour
Section guide: What this shows
Purpose: Compares on-demand hourly rental costs across major cloud providers for key AI accelerator chips. This establishes the baseline "unit cost" of compute.
- Compare providers for the same chip to find arbitrage opportunities.
- Consider committed use discounts (1-year) for long-term deployments.
Common mistake: Ignoring data egress fees or spot instance availability which can affect total cost of ownership.
| Provider | H100 | H200 | B200 | MI300X | TPU v6e |
|---|
GPU Variations
Regional Pricing Basis
Provider Pricing Basis
Pricing Update Schedule
Performance Benchmarks
Section guide: What this shows
Purpose: Measures real-world inference performance. Throughput indicates total system capacity (concurrent users), while speed per query indicates the latency for a single user.
- Use Throughput for batch processing or high-traffic serving.
- Use Speed per Query for interactive chat applications.
About Speed
Speed per query measures the generation rate for a single stream. Crucial for user experience in chatbots. >50 t/s is generally faster than human reading speed.
Understanding the Tradeoff
Systems often trade single-user speed for total system throughput. The ideal hardware sits in the top-right corner, offering both high capacity and fast individual responses.
Cost Calculation
Derived from hourly rental price divided by the system's token throughput capacity at reference usage.
End-to-End Latency
Total time to receive a full response. As concurrency rises (x-axis), requests queue up, increasing wait times.
System & Benchmark Specifications
Section guide: Data Manifest
Purpose: Detailed configuration logs for every benchmark run to ensure reproducibility and transparency.
| Model Name | System | Provider | Precision | TP/PP/DP | Framework | Date |
|---|
