AI Model Rankings

Top 20 frontier models ranked by the Artificial Analysis Agentic Index and Intelligence Index, with live pricing sourced directly from OpenRouter.

20Models tracked

2 IndexesBenchmarks

Live via APIPrices

Token ratio:

Score tier:

Frontier (80%+)

Leading (60%+)

Competitive (40%+)

Developing

#ModelProviderAgentic Index ScoreInput $/1MOutput $/1MAvg Cost $/1M3 : 1 ratioContext

🥇

Claude Opus 4.6 (Extended Thinking)

anthropic/claude-opus-4.6

Anthropic

—

200K

🥈

o3 (high effort)

openai/o3

OpenAI

—

200K

🥉

Claude Sonnet 4.6 (Extended Thinking)

anthropic/claude-sonnet-4.6

Anthropic

—

200K

Gemini 3.1 Pro Preview

google/gemini-3.1-pro-preview

Google

—

1.0M

GPT-5.4

openai/gpt-5.4

OpenAI

—

1.1M

Grok 4.20 Beta

x-ai/grok-4.20-beta

xAI

—

GPT-5.3 Codex (xhigh)

openai/gpt-5.3-codex

OpenAI

—

128K

DeepSeek R1 (Thinking)

deepseek/deepseek-r1

DeepSeek

—

128K

Gemini 3 Flash Preview

google/gemini-3-flash-preview

Google

—

1.0M

GPT-4.1

openai/gpt-4.1

OpenAI

—

Qwen3 235B A22B (Thinking)

qwen/qwen3-235b-a22b

Qwen / Alibaba

—

128K

Llama 4 Maverick

meta-llama/llama-4-maverick

Meta

—

128K

Average (Top 20)

0 of 20 models with live pricing

61.1

—

Data SourcesBenchmark scores are sourced from the Artificial Analysis Intelligence Index v4.0 and Agentic Index (GDPval-AA), as published on openrouter.ai/rankings. Live pricing is fetched directly from the OpenRouter public API. Models not yet available on OpenRouter show "—" for pricing. Scores are subject to change as Artificial Analysis updates their evaluation methodology.

Average Cost & Price Methodology

The Avg Cost $/1M column shows the blended cost per million tokens based on assumed input:output ratios. Select a ratio in the toolbar above to switch scenarios. All averages below reflect the currently selected ratio: 3 : 1 (General chat / RAG).

3 : 1 · General chat / RAG

75% × Input + 25% × Output

3 input tokens for every 1 output token. Typical for retrieval-augmented generation (RAG), summarisation, and general chat.

1 : 1 · Coding agents

50% × Input + 50% × Output

Equal input and output volume. Typical for coding agents, where prompts and generated code are similarly sized.

1 : 3 · Long-form generation

25% × Input + 75% × Output

1 input token for every 3 output tokens. Typical for long-form drafting, report generation, and creative writing.

Avg Cost $/1M (3 : 1 ratio)

—

0.75 × Avg_In + 0.25 × Avg_Out

Weighted average using the 3 : 1 input:output ratio across 0 priced models

Avg. Input Price ($/1M tokens)

—

Σ Input₁…0 ÷ 0

Arithmetic mean of input costs across 0 models with live pricing

Avg. Output Price ($/1M tokens)

—

Σ Output₁…0 ÷ 0

Arithmetic mean of output costs across 0 models with live pricing

Avg. Agentic Index Score

61.1

Σ Score₁…20 ÷ 20

Simple arithmetic mean of all 20 benchmark scores in this index

Prices are per 1 million tokens as returned by the OpenRouter API (pricing.prompt × 10⁶ for input, pricing.completion × 10⁶ for output). Free models ($0) are included in the average. Models without a live price listing are excluded from the price averages.