AI League — Game Day 5: GPT-5.5 Posts a New Season-High, Claude's Speed Rebounds

AI League — Game Day 5: GPT-5.5 Posts a New Season-High, Claude's Speed Rebounds

GPT-5.5 hits 62.1 t/s — new season-high for the top-2 bracket. Claude bounces to 60.1 t/s. Gemini 3.5 Flash surges to 187 t/s. Grok and DeepSeek hold. Full June 2 stats. #AILeague

AIL·Stats Board
2026/6/2 · 8:14
購読 1 件 · コンテンツ 3 件
Hook: GPT-5.5 hits 62.1 t/s — new season-high. Claude bounces back to 60.1 t/s. Gemini 3.5 Flash surges to 187 t/s. Grok and DeepSeek hold position. Full June 2 stats. #AILeague

Final standings — June 2, 2026

Fresh 72-hour median readings from Artificial Analysis:1
RankTeamAI IndexSpeed (t/s)Blended $/1Mvs. Day 4
🥇 1Anthropic — Claude Opus 4.8 (max)6160.1$4.10↑ +8.1% speed
🥈 2OpenAI — GPT-5.5 (xhigh)6062.1$4.35↑ +2.3% ★ new high
3OpenAI — GPT-5.5 (high)5928.9$4.35
4Anthropic — Claude Opus 4.7 (max)5723.7$4.10
4Google — Gemini 3.1 Pro Preview57143.0$1.74
4OpenAI — GPT-5.5 (medium)5761$4.35
4Alibaba — Qwen3.7 Max57189$1.43
8Google — Gemini 3.5 Flash (high)55187$1.31↑ +6.4%
9Kimi K2.65454$0.70
9Xiaomi — MiMo-V2.5-Pro5452$0.18
11xAI — Grok 4.3 (high)53144.6$0.64↑ +1.2%
12DeepSeek V4 Pro (Max)5247.8$0.18
13Meta — Muse Spark52still unpriced
14Meta — Llama 4 Scout14109$0.22
★ New season-high for any model in the top-2 intelligence bracket.
AI League leaderboard overview — intelligence, speed, and price rankings for 100+ models
Artificial Analysis LLM Leaderboard — June 2 live snapshot 1

Game of the night: OpenAI posts a new record

GPT-5.5 (xhigh) logged 62.1 t/s on the OpenAI endpoint. That beats Day 4's 60.7 t/s by 2.3% and sets a new season-high for the top-2 intelligence bracket.2 Over five days, the OpenAI squad has pushed output speed from 52.1 t/s all the way to 62.1 — a 19% jump with the intelligence score locked at 60 the whole time.
Claude Opus 4.8 answered with a strong comeback. After slipping to 55.6 t/s on Day 4, it bounced to 60.1 t/s — an 8.1% recovery — reclaiming the speed lead within the top-2 bracket.3 Intelligence holds at 61.
The chart below shows the five-day speed trend for the top-2 bracket. GPT-5.5 has been climbing. Claude has oscillated but recovered hard today.
チャートを読み込んでいます…
The AI Index rankings haven't moved since Day 1. Five straight sessions, same order. The intelligence race is settled at the top; the competition now runs entirely on speed and pricing tracks.

Speed panel — the quick and the slow

ESPN-style speed leaderboard graphic showing AI model output speeds from Qwen3.7 Max at 189 t/s down to DeepSeek at 47.8 t/s
Speed rankings for the top bracket models — Game Day 5 (AI-generated graphic)
Top speed this session belongs to Qwen3.7 Max at 189 t/s — but that's at AI Index 57, not 61. Gemini 3.5 Flash comes in at 187 t/s, recovering sharply from 175.7 on Day 4 and closing back on its Day 3 reading of 183.2 t/s. Google's fast-tier shows ±6-8% daily oscillation, which now looks like normal variance rather than a signal.
Grok 4.3 (high) nudged up to 144.6 t/s on the xAI native endpoint; Azure lags at 112.5 t/s for the same model.4 Loud ownership, solid speed, middling intelligence at 53 — that profile hasn't changed.
DeepSeek V4 Pro clocked 47.8 t/s on the official endpoint. Third-party inference providers tell a different story: Lightning AI runs the same model at 149.3 t/s, Fireworks at 118.6 t/s.5 The low-budget dark horse is sitting on borrowed pace at the primary endpoint.

Pricing war update

No price moves today. The floor stays at $0.18/M blended, held jointly by DeepSeek V4 Pro and Xiaomi's MiMo-V2.5-Pro (AI Index 54 — two points above DeepSeek at identical price).1
The top-2 pricing gap is unchanged. Claude Opus 4.8 costs $4.10/M blended vs. GPT-5.5 (xhigh) at $4.35/M. The OpenAI squad now also outpaces Claude on raw throughput — 62.1 vs. 60.1 t/s — so the premium no longer buys speed. Developers who need throughput have a genuine choice to make.
Muse Spark (Meta) remains unpriced at AI Index 52, same level as DeepSeek V4 Pro. When Meta puts a number on it, the $0.18 floor gets a direct test.

Challenger watch

Qwen3.7 Max continues to be the standout at the 57-index tier: 189 t/s and $1.43/M puts it at 3× the speed of Claude Opus 4.7 (23.7 t/s) at the same intelligence score.1 For mid-tier throughput-heavy workloads, the Alibaba squad's case gets stronger every day it holds that position.
GPT-5.3 Codex (xhigh) entered the tracked top-15 at AI Index 54, 90 t/s, $1.87/M — a code-specialized variant above Kimi and MiMo on intelligence. Worth tracking for developer-focused pipelines.
Kimi K2.6 held AI Index 54 with a slight speed uptick to 54 t/s (from ~50 t/s last session). Quiet and consistent.

Postgame summary

Five days in, the pattern is clear: the intelligence order is frozen at the top, while speed is the active competition. GPT-5.5 (xhigh) set a new season-high at 62.1 t/s, and Claude Opus 4.8 answered with a strong 60.1 t/s rebound. Gemini 3.5 Flash reclaimed the 185+ t/s range, confirming that Day 4's 175.7 was a down day. DeepSeek and Grok hold steady. The wildcard is still Muse Spark — unpriced, sitting at AI Index 52, with Meta yet to show what they'll charge for it. That pricing reveal is the most consequential pending announcement in the league right now.

このコンテンツについて、さらに観点や背景を補足しましょう。

  • ログインするとコメントできます。