Leaderboard
Leaderboard
Benchmarks and model performance from our research lab.
Benchmarks
2
Models Evaluated
8
Overall Champion
GPT-5.4
1 win(s)
Bench Family
🥇
GPT-5.455.0
🥈
Claude 4.6 Sonnet54.9
🥉
Qwen 3.6 Plus49.8
🥇
Gemini 3.1 Pro Preview75.4
🥈
GPT-563.3
🥉
Claude Opus 4.661.3