SWE-Bench-Pro agent results

CodeAgentBench

A leaderboard for coding agents on SWE-Bench-Pro: 151 tasks, 3 attempts per task, and 453 planned scoreable attempts in total. Rankings include completed 453/453 runs and sort by Final Score by default; partial progress rows are shown at the bottom when available.

Leaderboard Family: ClawProBench CodeAgentBench CURRENT LLMLeadBoard

CodeAgentBench: SWE-Bench-Pro CURRENT TerminalBench 2.1

Best Final Score 37.59 OpenCode - GLM 5.2

Completed Models 32 No partial progress rows

Scoreable Attempts 453 151 tasks x 3 tries

Exported 2026-06-18 SWEPro 151 zh pass@3

Benchmark Leaderboard

Final Score is the primary ranking signal. It weights Pass^3 most strongly for stable agent performance, while Pass@3 keeps each model's best-of-three reach visible.

Agent

#	Agent / Model	Agent Version	Final Score	Pass^3	Pass@3	Attempt Score	Solved Tasks	Solved Attempts	Coverage	Tokens K (In / Out / Total)	Log Archive	Full Tree	Exported	Model Dir
1	GLM 5.2 OpenCodeZhipu GLM - zai-coding-plan/glm-5.2	opencode-cli 1.17.8	37.59	36/151 - 23.8%	57/151 - 37.7%	140/453 - 30.9%	57/151	140/453	453/453 100%	--	0.0 MB	0.0 MB		`opencode-glm52`
2	GPT 5.5 (xhigh) CodexOpenAI - gpt-5.5#effort=xhigh	codex-cli 0.135.0	36.88	40/151 - 26.5%	51/151 - 33.8%	136/453 - 30.0%	51/151	136/453	453/453 100%	In 1,452,160.3K Out 13,057.2K Total 1,465,217.5K	228.15 MB	193.18 MB	2026-06-03	`codex-gpt55`
3	GPT 5.4 (xhigh) CodexOpenAI - gpt-5.4#effort=xhigh	codex-cli 0.135.0	36.79	35/151 - 23.2%	55/151 - 36.4%	136/453 - 30.0%	55/151	136/453	453/453 100%	In 945,325.4K Out 13,082.2K Total 958,407.6K	212.55 MB	173.27 MB	2026-06-03	`codex-gpt54`
4	GPT 5.3 codex (xhigh) CodexOpenAI - gpt-5.3-codex#effort=xhigh	codex-cli 0.135.0	36.09	31/151 - 20.5%	57/151 - 37.7%	131/453 - 28.9%	57/151	131/453	453/453 100%	In 1,456,165.3K Out 12,486.2K Total 1,468,651.4K	202.4 MB	160.93 MB	2026-06-03	`codex-gpt53-xhigh`
5	Kimi K2.6(Kimi for Coding) KimiMoonshot - kimi-code/kimi-for-coding	kimi-cli 1.40.0	35.45	32/151 - 21.2%	54/151 - 35.8%	126/453 - 27.8%	54/151	126/453	453/453 100%	--	30.49 MB	37.78 MB	2026-06-12	`kimi-for-coding`
6	MiniMax M3 OpenCodeMiniMax - minimax-cn-coding-plan/MiniMax-M3	opencode-cli 1.14.32	33.91	25/151 - 16.6%	56/151 - 37.1%	120/453 - 26.5%	56/151	120/453	453/453 100%	In 2,765,053.6K Out 12,471.7K Total 2,777,525.3K	277.35 MB	244.03 MB	2026-06-13	`opencode-minimax-m3`
7	DeepSeek v4 pro (max) OpenCodeDeepSeek - deepseek/deepseek-v4-pro#variant=max	opencode-cli 1.14.32	32.95	28/151 - 18.5%	48/151 - 31.8%	117/453 - 25.8%	48/151	117/453	453/453 100%	In 838,613.9K Out 10,784.1K Total 849,398.1K	183.66 MB	156.39 MB	2026-05-22	`opencode-deepseek-v4-flash-max`
8	GLM 5.1 OpenCodeZhipu GLM - zai-coding-plan/glm-5.1	opencode-cli 1.14.32	31.74	27/151 - 17.9%	45/151 - 29.8%	110/453 - 24.3%	45/151	110/453	453/453 100%	In 790,149.5K Out 7,670.5K Total 797,820K	165.27 MB	142.29 MB	2026-06-03	`opencode-glm51`
9	Qwen 3.7 Max (1m) QoderQwen - Qwen3.7-Max#context-window=1000000	qodercli-1.0.19	31.62	25/151 - 16.6%	47/151 - 31.1%	109/453 - 24.1%	47/151	109/453	453/453 100%	In 790M Out 7.6M Total 797M	368.0 MB	368.0 MB	2026-06-14	`qodercli-qwen37max-direct-full453-10c-r1-20260613`
10	Qwen 3.5 plus QwenQwen - qwen3.5-plus	qwen-cli 0.14.5	31.39	22/151 - 14.6%	51/151 - 33.8%	105/453 - 23.2%	51/151	105/453	453/453 100%	--	28.18 MB	36.45 MB	2026-06-03	`qwen-3.5plus`
11	Qwen 3.7 Plus (1m) QoderQwen - Qwen3.7-Plus#context-window=1000000	qodercli-1.0.20	31.21	28/151 - 18.5%	43/151 - 28.5%	104/452 - 23.0%	43/151	104/452	452/453 100%	In 0 Out 0 Total 0	457 MB	457 MB	2026-06-16	`qodercli-qwen37plus-direct-full453-10c-r1-20260615`
12	MiMo v2.5 pro Claude CodeXiaomi - xiaomi/mimo-v2.5-pro	claude-code 2.1.158	30.69	25/151 - 16.6%	44/151 - 29.1%	103/453 - 22.7%	44/151	103/453	453/453 100%	--	30.16 MB	37.39 MB	2026-06-03	`claude-mimo25pro`
13	MiMo v2.5 pro (high) OpenCodeXiaomi - xiaomi-token-plan-cn/mimo-v2.5-pro#variant=high	opencode-cli 1.14.32	30.3	22/151 - 14.6%	46/151 - 30.5%	102/453 - 22.5%	46/151	102/453	453/453 100%	In 846,117.3K Out 7,876.6K Total 853,993.9K	170.12 MB	151.78 MB	2026-06-03	`opencode-mimo25pro-high`
14	MiniMax M2.5 highspeed OpenCodeMiniMax - minimax-cn-coding-plan/MiniMax-M2.5-highspeed	opencode-cli 1.14.32	30.3	22/151 - 14.6%	46/151 - 30.5%	102/453 - 22.5%	46/151	102/453	453/453 100%	In 1,073,152.7K Out 8,449.4K Total 1,081,602.1K	173.6 MB	167.83 MB	2026-06-03	`opencode-minimax25-highspeed`
15	DeepSeek v4 flash (max) OpenCodeDeepSeek - deepseek/deepseek-v4-flash#variant=max	opencode-cli 1.14.32	28.99	20/151 - 13.2%	44/151 - 29.1%	95/453 - 21.0%	44/151	95/453	453/453 100%	In 921,687.4K Out 10,751.3K Total 932,438.7K	195.44 MB	166.59 MB	2026-06-03	`opencode-deepseek-v4-pro-max`
16	GLM 5 turbo OpenCodeZhipu GLM - zai-coding-plan/glm-5-turbo	opencode-cli 1.14.32	28.95	19/151 - 12.6%	44/151 - 29.1%	99/453 - 21.9%	44/151	99/453	453/453 100%	In 627,787.4K Out 5,887K Total 633,674.4K	19.75 MB	123.7 MB		`opencode-glm5turbo`
17	Qwen 3.6 plus QwenQwen - qwen3.6-plus	qwen-cli 0.14.5	28.43	18/151 - 11.9%	44/151 - 29.1%	95/453 - 21.0%	44/151	95/453	453/453 100%	--	30.57 MB	36.26 MB	2026-06-03	`qwen-3.6plus`
18	Step 3.7 flash OpenCodeStepFun - stepfun/step-3.7-flash	opencode-cli 1.14.32	28.01	18/151 - 11.9%	43/151 - 28.5%	91/453 - 20.1%	43/151	91/453	453/453 100%	In 1,495,692.5K Out 16,407.8K Total 1,512,100.3K	190.6 MB	178.53 MB	2026-06-09	`opencode-stepfun37-flash`
19	KAT Coder Pro v2 OpenCodeStreamLake - streamlake/kat-coder-pro-v2	opencode-cli 1.17.6	27.97	21/151 - 13.9%	39/151 - 25.8%	90/453 - 19.9%	39/151	90/453	453/453 100%	In 0 Out 0 Total 0	377 MB	377 MB	2026-06-16	`opencode-kat-coder-pro-v2-ps4-prepared-20260613`
20	MiMo v2.5 Claude CodeXiaomi - xiaomi/mimo-v2.5	claude-code 2.1.158	27.04	17/151 - 11.3%	41/151 - 27.2%	86/453 - 19.0%	41/151	86/453	453/453 100%	--	30.29 MB	38.14 MB	2026-06-03	`claude-mimo25`
21	Qwen 3.5 plus (180k) QoderQwen - qwen3.5-plus-cp#context-window=180000	qodercli 1.0.14	25.91	14/151 - 9.3%	41/151 - 27.2%	83/453 - 18.3%	41/151	83/453	453/453 100%	--	347.08 MB	292.84 MB	2026-06-12	`qoder-qwen35-plus-direct-20260612`
22	Qwen 3.6 plus (180k) QoderQwen - qwen3.6-plus#context-window=180000	qodercli 1.0.10	25.81	16/151 - 10.6%	38/151 - 25.2%	80/453 - 17.7%	38/151	80/453	453/453 100%	--	393.24 MB	324.98 MB	2026-06-10	`qoder-qwen36-plus-forward-20260610`
23	MiniMax M2.7 highspeed OpenCodeMiniMax - minimax-cn-coding-plan/MiniMax-M2.7-highspeed	opencode-cli 1.14.32	25.24	15/151 - 9.9%	38/151 - 25.2%	76/453 - 16.8%	38/151	76/453	453/453 100%	In 622,967.8K Out 5,196.1K Total 628,163.9K	15.21 MB	108.88 MB		`opencode-minimax27`
24	MiMo v2.5 OpenCodeXiaomi - xiaomi-token-plan-cn/mimo-v2.5	opencode-cli 1.14.32	25.24	18/151 - 11.9%	33/151 - 21.9%	78/453 - 17.2%	33/151	78/453	453/453 100%	In 898,147.7K Out 8,980.7K Total 907,128.4K	173.96 MB	153.6 MB	2026-06-03	`opencode-mimo25-tokenplan-high-20260527`
25	LongCat 2.0 Preview OpenCodeLongCat - LongCat/LongCat-2.0-Preview	opencode-cli 1.14.32	24.78	17/151 - 11.3%	33/151 - 21.9%	75/453 - 16.6%	33/151	75/453	453/453 100%	In 1,035,159.8K Out 6,078.2K Total 1,041,238K	189.85 MB	168.72 MB	2026-06-03	`opencode-longcat`
26	Step 3.5 flash 2603 OpenCodeStepFun - stepfun/step-3.5-flash-2603	opencode-cli 1.14.32	24.21	13/151 - 8.6%	37/151 - 24.5%	73/453 - 16.1%	37/151	73/453	453/453 100%	In 1,113,764.1K Out 12,772.1K Total 1,126,536.1K	192.8 MB	109.34 MB	2026-06-07	`opencode-stepfun35-2603`
27	GLM 4.7 OpenCodeZhipu GLM - zai-coding-plan/glm-4.7	opencode-cli 1.14.32	24.03	13/151 - 8.6%	36/151 - 23.8%	73/453 - 16.1%	36/151	73/453	453/453 100%	In 919,040.4K Out 6,212.7K Total 925,253.1K	308.12 MB	183.96 MB	2026-06-03	`opencode-glm47`
28	KAT Coder Pro v2 OpenCodeStreamLake - streamlake/kat-coder-pro-v2	opencode-cli 1.14.32	23.47	17/151 - 11.3%	29/151 - 19.2%	68/453 - 15.0%	29/151	68/453	453/453 100%	In 796,137.5K Out 6,397.8K Total 802,535.3K	163.02 MB	147.82 MB	2026-06-12	`opencode-kat-coder-pro-v2-kat2`
29	SenseNova 6.7 flash lite OpenCodeSenseNova - sensenova/sensenova-6.7-flash-lite	opencode-cli 1.14.32	23.14	13/151 - 8.6%	33/151 - 21.9%	68/453 - 15.0%	33/151	68/453	453/453 100%	In 917,349.4K Out 8,979K Total 926,328.4K	185.26 MB	174.21 MB	2026-06-03	`opencode-sensenova67-flash-lite`
30	doubao seed 2.0 code OpenCodeVolcengine - volcengine-plan/doubao-seed-2.0-code	opencode-cli 1.14.32	19.42	9/151 - 6.0%	27/151 - 17.9%	52/453 - 11.5%	27/151	52/453	453/453 100%	In 237,901.7K Out 2,334.3K Total 240,236K	10.73 MB	74.82 MB		`opencode-doubao-2-code`
31	Step 3.5 flash OpenCodeStepFun - stepfun/step-3.5-flash	opencode-cli 1.14.32	17.04	8/151 - 5.3%	21/151 - 13.9%	42/453 - 9.3%	21/151	42/453	453/453 100%	In 1,587,251.1K Out 12,183.8K Total 1,599,434.9K	213.12 MB	190.71 MB	2026-06-03	`opencode-stepfun35`
32	DeepSeek v4 flash (max) deepseek-tuiDeepSeek - deepseek-v4-flash#effort=max	deepseek-tui v0.8.39	16.22	11/151 - 7.3%	15/151 - 9.9%	38/453 - 8.4%	15/151	38/453	453/453 100%	--	28.82 MB	36.6 MB	2026-05-22	`deepseek-tui-v4-flash-max`

Notes: Final Score = 100 × S^0.55 × R^0.25 × A^0.20, where S=(Pass^3)^(1/3), R=1-(1-Pass@3)^(1/3), and A=Attempt Score. Pass^3 counts tasks solved in all 3 attempts. Pass@3 counts tasks solved at least once across 3 attempts. Attempt Score is solved scoreable attempts divided by 453. Tokens are shown in K and summed across all exported attempts. Input includes cache read/write tokens when exported. OpenCode rows use per-step command.log usage when available; other rows fall back to per-task run_summary.json fields. -- means non-zero usage was not exported. Agent Version combines each run's exported runtime label with the current local CLI version command when historical reports do not include exact semver. Completed rows are rankable only when the exported summary reports 453 scoreable attempts; partial rows are shown as progress references at the bottom.

Visual Leaderboard

Switch metrics to compare the same agent/model pairs by reach, consistency, and per-attempt solve rate.

GLM 5.2 OpenCode - Zhipu GLM

GPT 5.5 (xhigh) Codex - OpenAI

GPT 5.4 (xhigh) Codex - OpenAI

GPT 5.3 codex (xhigh) Codex - OpenAI

Kimi K2.6(Kimi for Coding) Kimi - Moonshot

MiniMax M3 OpenCode - MiniMax

DeepSeek v4 pro (max) OpenCode - DeepSeek

GLM 5.1 OpenCode - Zhipu GLM

Qwen 3.7 Max (1m) Qoder - Qwen

Qwen 3.5 plus Qwen - Qwen

Qwen 3.7 Plus (1m) Qoder - Qwen

MiMo v2.5 pro Claude Code - Xiaomi

MiMo v2.5 pro (high) OpenCode - Xiaomi

MiniMax M2.5 highspeed OpenCode - MiniMax

DeepSeek v4 flash (max) OpenCode - DeepSeek

GLM 5 turbo OpenCode - Zhipu GLM

Qwen 3.6 plus Qwen - Qwen

Step 3.7 flash OpenCode - StepFun

KAT Coder Pro v2 OpenCode - StreamLake

MiMo v2.5 Claude Code - Xiaomi

Qwen 3.5 plus (180k) Qoder - Qwen

Qwen 3.6 plus (180k) Qoder - Qwen

MiniMax M2.7 highspeed OpenCode - MiniMax

MiMo v2.5 OpenCode - Xiaomi

LongCat 2.0 Preview OpenCode - LongCat

Step 3.5 flash 2603 OpenCode - StepFun

GLM 4.7 OpenCode - Zhipu GLM

KAT Coder Pro v2 OpenCode - StreamLake

SenseNova 6.7 flash lite OpenCode - SenseNova

doubao seed 2.0 code OpenCode - Volcengine

Step 3.5 flash OpenCode - StepFun

DeepSeek v4 flash (max) deepseek-tui - DeepSeek

Reach vs Consistency

Pass@3 shows whether an agent can solve a task at least once; Pass^3 shows whether it solves the same task all three times. The gap is useful when comparing stochastic or retry-sensitive agents.

Pass@3

OpenCode - GLM 5.2

Codex - GPT 5.5 (xhigh)

Codex - GPT 5.4 (xhigh)

Codex - GPT 5.3 codex (xhigh)

Kimi - Kimi K2.6(Kimi for Coding)

OpenCode - MiniMax M3

OpenCode - DeepSeek v4 pro (max)

OpenCode - GLM 5.1

Qoder - Qwen 3.7 Max (1m)

Qwen - Qwen 3.5 plus

Qoder - Qwen 3.7 Plus (1m)

Claude Code - MiMo v2.5 pro

OpenCode - MiMo v2.5 pro (high)

OpenCode - MiniMax M2.5 highspeed

OpenCode - DeepSeek v4 flash (max)

OpenCode - GLM 5 turbo

Qwen - Qwen 3.6 plus

OpenCode - Step 3.7 flash

OpenCode - KAT Coder Pro v2

Claude Code - MiMo v2.5

Qoder - Qwen 3.5 plus (180k)

Qoder - Qwen 3.6 plus (180k)

OpenCode - MiniMax M2.7 highspeed

OpenCode - MiMo v2.5

OpenCode - LongCat 2.0 Preview

OpenCode - Step 3.5 flash 2603

OpenCode - GLM 4.7

OpenCode - KAT Coder Pro v2

OpenCode - SenseNova 6.7 flash lite

OpenCode - doubao seed 2.0 code

OpenCode - Step 3.5 flash

deepseek-tui - DeepSeek v4 flash (max)

Pass^3