Powered by OpenRouter · Artificial Analysis

AI Model Rankings

Top 20 frontier models ranked by the Artificial Analysis Agentic Index and Intelligence Index, with live pricing sourced directly from OpenRouter.

20Models tracked
2 IndexesBenchmarks
Live via APIPrices
Token ratio:
Score tier:
Frontier (80%+)
Leading (60%+)
Competitive (40%+)
Developing
#ModelProviderAgentic Index ScoreInput $/1MOutput $/1MAvg Cost $/1M3 : 1 ratioContext
🥇
Claude Opus 4.6 (Extended Thinking)
anthropic/claude-opus-4.6
Anthropic
85
200K
🥈
o3 (high effort)
openai/o3
OpenAI
82
200K
🥉
Claude Sonnet 4.6 (Extended Thinking)
anthropic/claude-sonnet-4.6
Anthropic
79
200K
4
Gemini 3.1 Pro Preview
google/gemini-3.1-pro-preview
Google
77
1.0M
5
GPT-5.4
openai/gpt-5.4
OpenAI
75
1.1M
6
Grok 4.20 Beta
x-ai/grok-4.20-beta
xAI
73
2M
7
GPT-5.3 Codex (xhigh)
openai/gpt-5.3-codex
OpenAI
71
128K
8
DeepSeek R1 (Thinking)
deepseek/deepseek-r1
DeepSeek
67
128K
9
Gemini 3 Flash Preview
google/gemini-3-flash-preview
Google
65
1.0M
10
GPT-4.1
openai/gpt-4.1
OpenAI
63
1M
11
Qwen3 235B A22B (Thinking)
qwen/qwen3-235b-a22b
Qwen / Alibaba
61
128K
12
Llama 4 Maverick
meta-llama/llama-4-maverick
Meta
58
1M
13
Claude Haiku 4.5
anthropic/claude-haiku-4.5
Anthropic
56
200K
14
GPT-4.1 Mini
openai/gpt-4.1-mini
OpenAI
54
1M
15
Mistral Large 2512
mistralai/mistral-large-2512
Mistral
51
128K
16
Gemini 3.1 Flash Lite Preview
google/gemini-3.1-flash-lite-preview
Google
48
1.0M
17
Qwen3.5 9B
qwen/qwen3.5-9b
Qwen / Alibaba
44
262K
18
Phi-4
microsoft/phi-4
Microsoft
41
32K
19
DeepSeek V3.2
deepseek/deepseek-v3.2
DeepSeek
38
128K
20
Llama 3.3 70B Instruct
meta-llama/llama-3.3-70b-instruct
Meta
34
128K
Ø
Average (Top 20)
0 of 20 models with live pricing
61.1
Data SourcesBenchmark scores are sourced from the Artificial Analysis Intelligence Index v4.0 and Agentic Index (GDPval-AA), as published on openrouter.ai/rankings. Live pricing is fetched directly from the OpenRouter public API. Models not yet available on OpenRouter show "—" for pricing. Scores are subject to change as Artificial Analysis updates their evaluation methodology.
Average Cost & Price Methodology

The Avg Cost $/1M column shows the blended cost per million tokens based on assumed input:output ratios. Select a ratio in the toolbar above to switch scenarios. All averages below reflect the currently selected ratio: 3 : 1 (General chat / RAG).

3 : 1 · General chat / RAG
75% × Input + 25% × Output
3 input tokens for every 1 output token. Typical for retrieval-augmented generation (RAG), summarisation, and general chat.
1 : 1 · Coding agents
50% × Input + 50% × Output
Equal input and output volume. Typical for coding agents, where prompts and generated code are similarly sized.
1 : 3 · Long-form generation
25% × Input + 75% × Output
1 input token for every 3 output tokens. Typical for long-form drafting, report generation, and creative writing.
Avg Cost $/1M (3 : 1 ratio)
0.75 × Avg_In + 0.25 × Avg_Out
Weighted average using the 3 : 1 input:output ratio across 0 priced models
Avg. Input Price ($/1M tokens)
Σ Input₁…0 ÷ 0
Arithmetic mean of input costs across 0 models with live pricing
Avg. Output Price ($/1M tokens)
Σ Output₁…0 ÷ 0
Arithmetic mean of output costs across 0 models with live pricing
Avg. Agentic Index Score
61.1
Σ Score₁…20 ÷ 20
Simple arithmetic mean of all 20 benchmark scores in this index

Prices are per 1 million tokens as returned by the OpenRouter API (pricing.prompt × 10⁶ for input, pricing.completion × 10⁶ for output). Free models ($0) are included in the average. Models without a live price listing are excluded from the price averages.