benchmarks.title

Compare 8 benchmarks across coding, knowledge, long_context, multilingual and more. Featuring results from Gemini 3 Deep Think, Gemini 3 Pro, GPT-5.1 and 1 other models.

8

benchmarks.stats.benchmarks

8

benchmarks.stats.categories

8

benchmarks.stats.capabilities

benchmarks.filterTitle

benchmarks.showing 8 benchmarks.of 8 benchmarks.stats.benchmarks

benchmarks.loading

benchmarks.comparisonMatrix

24 benchmarks.rows
benchmarks.table.benchmarkbenchmarks.table.modelbenchmarks.table.scorebenchmarks.table.toolsbenchmarks.table.source

Humanity's Last Exam

accuracy

Gemini 3 Deep Think41%Link

Humanity's Last Exam

accuracy

Gemini 3 Pro37.5%Link

Humanity's Last Exam

accuracy

GPT-5.126.5%Link

GPQA Diamond

accuracy

Gemini 3 Deep Think93.8%Link

GPQA Diamond

accuracy

Gemini 3 Pro91.9%Link

GPQA Diamond

accuracy

GPT-5.188.1%Link

ARC-AGI-2

accuracy

Gemini 3 Deep Think45.1%✓ YesLink

ARC-AGI-2

accuracy

Gemini 3 Pro31.1%Link

ARC-AGI-2

accuracy

GPT-5.117.6%Link

MMMU-Pro

accuracy

Gemini 3 Pro87.6%Link

MMMU-Pro

accuracy

GPT-5.183.6%Link

MMMU-Pro

accuracy

Claude Sonnet 4.577.8%Link

LiveCodeBench Pro

elo_rating

Gemini 3 Pro2,439 EloLink

LiveCodeBench Pro

elo_rating

GPT-5.11,775 EloLink

LiveCodeBench Pro

elo_rating

Claude Sonnet 4.51,418 EloLink

FACTS Benchmark Suite

score

Gemini 3 Pro70.5%✓ YesLink

FACTS Benchmark Suite

score

GPT-5.163.4%✓ YesLink

FACTS Benchmark Suite

score

Claude Sonnet 4.550.4%✓ YesLink

MMMLU

accuracy

Gemini 3 Pro91.8%Link

MMMLU

accuracy

GPT-5.189.5%Link