A benchmark for OpenClaw agents

ClawProBench

A benchmark designed to measure how different models perform under OpenClaw when they need to reason, plan, use tools, and stay reliable across repeated runs.

Benchmark Leaderboard

Final Score is the default rank signal. The table below shows the open dataset by default and adds closed dataset rows where a model has completed the 68-task closed run.

Browse Tasks →
Dataset
# Model Final Score Pass^3 Pass@3 Avg Score Capability Efficiency Planning Safety Tool Use Constraints Error Recovery Synthesis Avg Runtime Total Token Token Cost Price OpenClaw Released Updated
1
intern-s2-preview Open Dataset ShangHai AILab · intern
73.71
61.9%
81.4 76.7 78.0 89.8 79.2 70.5 82.2 76.9 73.7 75.2 95.37s 70824855
input:11105542
output:886191
cacheread:58833122
$0
$0 / $0 ¥0 / ¥0
v3.2.6 2026-05-15 2026-05-16
#29
intern-s2-preview Closed Dataset ShangHai AILab · intern
47.01
25.1%
62.5 61.8 61.9 99.6 64.6 49.9 62.2 65.1 62.7 65.4 70.48s 25516250
input:6473542
output:222520
cacheread:18820188
$0
$0 / $0 ¥0 / ¥0
v3.2.6 2026-05-15 2026-05-15
2
Sensenova 6.7 Flash Lite Open Dataset sensetime · sensenova
71.59
63.4%
76.5 73.7 74.3 93.8 69.7 70.5 78.8 70.1 78.5 74.3 124.84s 41797781
input:30999558
output:763023
cacheread:10035200
$0
$0 / $0 ¥0 / ¥0
v3.2.6 2026-05-08 2026-05-07
#33
Sensenova 6.7 Flash Lite Closed Dataset sensetime · sensenova
45.11
22.2%
58.5 62.1 62.1 100 62.5 53.8 62.5 67.9 62.5 62.9 121.3s 16527596
input:14660476
output:220528
cacheread:1646592
$0
$0 / $0 ¥0 / ¥0
v3.2.6 2026-05-08 2026-05-13
3
ERNIE 5.1 Open Dataset baidu · baiduqianfan
68.58
57.7%
72.9 70.7 71.6 92.0 57.5 67.6 79.4 77.9 73.3 70.3 125.3s 26530872
input:6925181
output:1622971
cacheread:17982720
$19.0706
$0.8721 / $3.1977 ¥6 / ¥22
v3.2.6 2026-05-09 2026-05-10
#32
ERNIE 5.1 Closed Dataset baidu · baiduqianfan
46.24
25.3%
60.4 61.7 61.7 100 64.1 49.8 65.8 61.0 62.5 65.1 142.79s 7292445
input:4000397
output:376817
cacheread:2915231
$5.9649
$0.8721 / $3.1977 ¥6 / ¥22
v3.2.6 2026-05-09 2026-05-14
4
gpt-5.5-xhigh Open Dataset openai · openai
67.9
62.7%
68.7 69.3 69.3 100 66.9 66.8 78.0 70.0 69.4 62.7 143.76s 139773819
input:72643986
output:1352681
cacheread:65777152
$568.2432
$5 / $30 ¥34.413 / ¥206.478
v3.2.6 2026-04-24 2026-05-04
#7
gpt-5.5-xhigh Closed Dataset openai · openai
55.77
50.2%
71.8 68.2 68.2 100 73.6 50.2 77.7 53.2 75.1 74.1 170.36s 58646317
input:30958715
output:597426
cacheread:27090176
$240.4418
$5 / $30 ¥34.413 / ¥206.478
v3.2.6 2026-04-24 2026-05-13
5
gpt-5.4-xhigh Open Dataset openai · openai
67.25
60.0%
70.0 68.0 68.0 100 66.8 67.1 75.6 66.8 69.2 60.7 165.36s 137256515
input:74566518
output:1723981
cacheread:60966016
$288.4835
$2.5 / $15 ¥17.2065 / ¥103.239
v3.2.6 2026-03-05 2026-05-05
#10
gpt-5.4-xhigh Closed Dataset openai · openai
53.04
37.3%
69.5 67.0 67.0 100 66.9 58.8 74.3 62.5 72.1 64.9 165.7s 51143959
input:27146296
output:598495
cacheread:23399168
$106.0921
$2.5 / $15 ¥17.2065 / ¥103.239
v3.2.6 2026-03-05 2026-05-13
6
gpt-5.3-codex-xhigh Open Dataset openai · openai
65.45
55.9%
67.3 66.8 66.8 100 65.5 66.8 72.9 65.8 68.7 59.5 168.4s 129894237
input:67094837
output:1348392
cacheread:61451008
$190.0631
$1.75 / $14 ¥12.0446 / ¥96.3564
v3.2.6 2026-02-24 2026-05-06
#4
gpt-5.3-codex-xhigh Closed Dataset openai · openai
58.45
41.9%
77.2 71.6 71.6 100 70.9 68.1 75.0 68.8 75.1 70.5 176.29s 59966437
input:30926521
output:575404
cacheread:28464512
$87.0835
$1.75 / $14 ¥12.0446 / ¥96.3564
v3.2.6 2026-02-24 2026-05-13
7
deepseek-v4-pro Open Dataset deepseek · deepseek
64.38
47.2%
65.1 69.6 69.9 96.6 74.9 68.2 70.1 66.6 72.4 63.5 115.94s 34630713
input:8007935
output:1429306
cacheread:25193472
$50.634
$2.0341 / $6.1023 ¥14 / ¥42
v3.2.6 2026-04-24 2026-04-25
#6
deepseek-v4-pro Closed Dataset deepseek · deepseek
56.35
34.1%
79.0 65.7 65.7 100 74.1 48.6 71.6 55.2 72.4 67.2 153.64s 13073754
input:2428295
output:395219
cacheread:10250240
$17.7761
$2.0341 / $6.1023 ¥14 / ¥42
v3.2.6 2026-04-24 2026-05-14
8
qwen3.5-plus Open Dataset qwen · bailiancodingplan
64.19
49.6%
61.2 70.1 70.9 91.1 76.5 67.2 70.7 69.6 75.0 59.4 123.72s 74256276
input:10347477
output:534393
cacheread:63374406
$11.7627
$0.26 / $1.56
v3.2.6 2026-02-15 2026-04-06
#13
qwen3.5-plus Closed Dataset qwen · bailiancodingplan
52.39
31.2%
71.7 64.6 64.6 100 69.8 48.7 68.0 60.7 71.1 66.3 139.34s 20529559
input:6489516
output:216852
cacheread:13823191
$3.8226
$0.26 / $1.56 ¥0 / ¥0
v3.2.6 2026-02-15 2026-05-16
9
qwen3.5-397b-a17b Open Dataset qwen · bailiantokenplan
64.18
49.5%
60.8 70.4 71.0 93.9 77.8 65.5 71.6 72.4 74.5 57.7 98.69s 57194123
input:10647098
output:535556
cacheread:46011469
$16.0699
$0.4359 / $2.6153 ¥3 / ¥18
v3.2.6 2026-02-15 2026-04-30
10
mimo-v2.5-pro Open Dataset xiaomi · xiaomi-token-plan
63.3
46.5%
62.5 68.5 68.8 97.4 74.1 63.9 70.6 67.4 70.8 61.9 63.3s 32650649
input:7473261
output:691052
cacheread:24486336
$44.3222
$2.0341 / $6.1023 ¥14 / ¥42
v3.2.6 2026-04-22 2026-04-24
#16
mimo-v2.5-pro Closed Dataset xiaomi · xiaomi-token-plan
51.6
35.6%
68.1 64.9 65.0 100.0 73.9 46.1 72.0 60.3 71.4 60.5 89s 15403931
input:3357860
output:231095
cacheread:11814976
$20.2569
$2.0341 / $6.1023 ¥14 / ¥42
v3.2.6 2026-04-22 2026-05-12
11
GLM-5.1 Open Dataset glm · glm
62.93
44.9%
61.6 69.0 69.1 98.8 75.2 68.4 71.0 65.8 74.1 56.7 89.33s 47621722
input:2299534
output:356684
cacheread:44965504
$25.9237
$1 / $3.2 ¥6.88 / ¥22.02
v3.2.6 2026-03-27 2026-03-30
#11
GLM-5.1 Closed Dataset glm · glm
52.89
39.0%
69.3 66.1 66.1 100 71.7 48.9 70.9 63.2 70.2 67.9 138.64s 11989353
input:424654
output:99483
cacheread:11465216
$6.4756
$1 / $3.2 ¥6.88 / ¥22.02
v3.2.6 2026-03-27 2026-05-16
12
doubao-seed-2.0-code Open Dataset seed · volcengine-plan
62.36
43.2%
62.9 67.7 68.6 90.3 70.3 67.5 69.1 62.9 74.6 60.7 149.96s 33552869
input:3918340
output:191987
cacheread:29442542
$27.3375
$1.3948 / $6.9741 ¥9.6 / ¥48
v3.2.6 2026-02-14 2026-04-04
#2
doubao-seed-2.0-code Closed Dataset seed · volcengine-plan
62.44
52.8%
81.8 73.2 73.2 100 73.7 79.3 69.4 67.9 75.0 74.9 158.18s 36067246
input:6524976
output:206269
cacheread:29336001
$30.9985
$1.3948 / $6.9741 ¥9.6 / ¥48
v3.2.6 2026-02-14 2026-05-11
13
Ring 2.6 1T Open Dataset inclusionai · openrouter
62.05
35.9%
74.1 65.2 65.9 92.1 59.1 67.9 77.2 62.0 65.3 57.6 127.94s 38611988
input:17862963
output:1506192
cacheread:19242833
$0
$0 / $0 ¥0 / ¥0
v3.2.6 2026-05-12
#20
Ring 2.6 1T Closed Dataset inclusionai · openrouter
49.34
26.5%
67.2 62.8 62.8 100.0 61.0 54.5 72.8 54.1 64.6 67.1 90.13s 14296281
input:8071709
output:389732
cacheread:5834840
$0
$0 / $0 ¥0 / ¥0
v3.2.6 2026-05-14
14
GLM-5-Turbo Open Dataset glm · glm
61.92
42.9%
61.6 67.4 67.6 98.1 72.3 67.6 69.9 67.2 66.6 58.4 51.87s 23122455
input:2087604
output:303610
cacheread:20731241
$16.1583
$1.2 / $4
v3.2.6 2026-03-16 2026-04-28
#14
GLM-5-Turbo Closed Dataset glm · glm
51.88
39.1%
66.2 66.9 66.9 100 72.2 56.6 67.2 64.1 68.4 71.3 140.06s 12055326
input:420194
output:103036
cacheread:11532096
$7.8356
$1.2 / $4 ¥0 / ¥0
v3.2.6 2026-03-16 2026-05-16
15
deepseek-v4-flash Open Dataset deepseek · deepseek
61.47
40.9%
61.4 67.6 67.9 95.4 72.7 65.5 68.2 66.3 69.8 61.0 76.39s 35990776
input:8281350
output:1484274
cacheread:26225152
$3.5399
$0.1453 / $0.2906 ¥1 / ¥2
v3.2.6 2026-04-24 2026-04-25
#31
deepseek-v4-flash Closed Dataset deepseek · deepseek
46.68
16.2%
67.8 59.9 59.9 100 57.7 47.4 72.1 51.9 66.5 60.1 198.43s 13256873
input:2429587
output:416918
cacheread:10410368
$1.2305
$0.1453 / $0.2906 ¥1 / ¥2
v3.2.6 2026-04-24 2026-05-16
16
doubao-seed-2.0-pro Open Dataset seed · volcengine-plan
61.07
41.6%
57.5 68.3 68.6 97.4 70.1 68.3 73.1 65.1 73.8 57.4 90.89s 20474370
input:3405226
output:188658
cacheread:16880486
$17.8378
$1.3948 / $6.9741 ¥9.6 / ¥48
v3.2.6 2026-02-14 2026-04-04
#18
doubao-seed-2.0-pro Closed Dataset seed · volcengine-plan
51.3
39.8%
63.5 68.5 68.5 100 74.6 67.5 61.7 65.0 75.2 67.1 159.7s 31157190
input:6557722
output:273428
cacheread:24326040
$28.0186
$1.3948 / $6.9741 ¥9.6 / ¥48
v3.2.6 2026-02-14 2026-05-14
17
Claude Sonnet 4.6 Open Dataset anthropic · openrouter
60.5
45.5%
53.9 66.6 67.5 90.9 71.6 65.0 64.0 66.8 71.4 60.2 178.58s 29132003
input:28539688
output:592315
$94.5038
$3 / $15
v3.2.6 2026-02-18 2026-04-06
18
doubao-seed-2.0-lite Open Dataset seed · volcengine-plan
60.4
44.7%
55.1 66.1 66.6 93.0 66.7 65.1 70.3 64.8 73.1 55.1 275.5s 90593669
input:11840992
output:583364
cacheread:78169313
$14.2325
$0.2615 / $1.5692 ¥1.8 / ¥10.8
v3.2.6 2026-02-14 2026-04-12
19
mimo-v2.5 Open Dataset xiaomi · xiaomi-token-plan
60.39
43.1%
55.8 66.6 66.8 96.6 70.9 65.9 64.8 64.8 72.3 59.7 46.68s 35597981
input:7607726
output:655087
cacheread:27335168
$19.9746
$0.8136 / $4.0682 ¥5.6 / ¥28
v3.2.6 2026-04-22 2026-04-23
#24
mimo-v2.5 Closed Dataset xiaomi · xiaomi-token-plan
48.02
25.0%
64.9 62.2 62.2 100 69.6 54.4 68.9 55.7 60.1 59.5 194.11s 15107697
input:3328716
output:195045
cacheread:11583936
$8.2141
$0.8136 / $4.0682 ¥5.6 / ¥28
v3.2.6 2026-04-22 2026-05-16
20
qwen3.6-plus Open Dataset qwen · bailiantokenplan
60.2
43.0%
54.5 66.9 67.3 95.6 70.9 64.0 67.2 67.8 71.5 58.4 90.34s 48872189
input:10301722
output:614619
cacheread:37955848
$38.2483
$1.16 / $6.97 ¥8 / ¥48
v3.2.6 2026-04-02 2026-04-22
#5
qwen3.6-plus Closed Dataset qwen · bailiantokenplan
57.08
39.7%
77.1 68.3 68.3 100 74.1 60.5 72.6 62.0 63.9 73.4 131.84s 13239510
input:12982159
output:257351
$16.853
$1.16 / $6.97 ¥8 / ¥48
v3.2.6 2026-04-02 2026-05-13
21
DeepSeek-V3.2 Open Dataset deepseek-ai · siliconflow
60.13
39.0%
57.0 67.6 68.4 90.6 68.7 65.8 68.9 65.3 73.3 62.8 160.91s 65468885
input:10004682
output:637259
cacheread:54826944
$9.9709
$0.26 / $0.38
v3.2.6 2025-12-01 2026-04-03
22
DeepSeek-V3.2 Open Dataset deepseek-ai · volcengine-plan
60.12
36.9%
60.8 66.7 68.0 86.9 68.8 67.1 64.2 65.4 72.5 62.5 201.7s 97734852
input:10027746
output:545655
cacheread:87161451
$14.1456
$0.26 / $0.38
v3.2.6 2025-12-01 2026-04-06
#9
DeepSeek-V3.2 Closed Dataset deepseek-ai · volcengine-plan
54.0
33.3%
73.5 66.3 66.3 100 71.3 60.2 65.4 71.0 62.6 66.3 173.07s 33217073
input:6440532
output:146561
cacheread:26629980
$5.1921
$0.26 / $0.38 ¥0 / ¥0
v3.2.6 2025-12-01 2026-05-16
23
huanyuan-3.0-preview Open Dataset tencent · tencent-token-plan
59.39
41.6%
56.6 64.2 64.6 96.0 73.3 61.1 60.1 63.3 72.6 53.3 56.08s 32183778
input:31764878
output:418900
$9.7178
$0.2906 / $1.1624 ¥2 / ¥8
v3.2.6 2026-04-23 2026-04-23
#21
huanyuan-3.0-preview Closed Dataset tencent · tencent-token-plan
49.2
35.2%
62.7 64.2 64.2 100 70.6 49.8 68.2 57.3 61.7 74.0 189.79s 12450815
input:12325159
output:125656
$3.7278
$0.2906 / $1.1624 ¥2 / ¥8
v3.2.6 2026-04-23 2026-05-16
24
kimi-k2.6 Open Dataset moonshot · volcengine-plan
59.31
39.1%
54.2 67.0 67.2 96.7 70.4 65.2 70.6 66.1 68.6 58.7 102.04s 56361641
input:11160022
output:935949
cacheread:44265670
$35.1856
$0.9444 / $4 ¥6.5 / ¥27
v3.2.6 2026-04-21 2026-04-28
#8
kimi-k2.6 Closed Dataset moonshot · volcengine-plan
54.14
38.2%
71.7 67.2 67.2 100 76.2 48.9 73.2 55.3 76.0 68.6 139.78s 19787995
input:6555533
output:286299
cacheread:12946163
$13.4494
$0.9444 / $4 ¥6.5 / ¥27
v3.2.6 2026-04-21 2026-05-11
25
doubao-seed-code Open Dataset seed · volcengine-plan
59.22
40.4%
55.7 65.0 65.8 93.4 63.6 67.2 68.9 59.8 72.7 56.7 105.51s 79743715
input:10387510
output:569004
cacheread:68787201
$19.5397
$0.4068 / $2.3247 ¥2.8 / ¥16
v3.2.6 2026-02-14 2026-04-09
26
qwen3.6-plus Open Dataset qwen · bailianapi
59.05
39.0%
55.9 65.3 66.0 91.2 71.2 63.3 64.1 66.7 68.4 56.2 120.83s 62909389
input:10724289
output:627594
cacheread:51557506
$46.7179
$1.16 / $6.97 ¥8 / ¥48
v3.2.6 2026-04-02 2026-04-04
27
LongCat-2.0-Preview Open Dataset meituan · longcat
58.8
39.0%
53.0 66.3 66.6 97.1 76.4 66.6 66.1 54.9 72.9 57.5 94.1s 30287164
input:10892258
output:471130
cacheread:18923776
$0
$0 / $0 ¥0 / ¥0
v3.2.6 2026-04-24 2026-04-25
#19
LongCat-2.0-Preview Closed Dataset meituan · longcat
49.93
30.2%
65.7 64.8 64.8 100 73.8 50.9 73.2 62.4 53.1 69.6 141.44s 12555390
input:307236
output:143706
cacheread:12104448
$0
$0 / $0 ¥0 / ¥0
v3.2.6 2026-04-24 2026-05-14
28
qwen3.6-27b Open Dataset qwen · bailiantokenplan
58.74
37.9%
55.2 65.5 65.9 96.7 68.9 67.1 61.5 67.3 71.4 57.3 89.85s 50185369
input:10744142
output:691108
cacheread:38750119
$14.9364
$0.4359 / $2.6153 ¥3 / ¥18
v3.2.6 2026-04-22 2026-04-26
29
kimi-k2.5 Open Dataset moonshot · moonshot
58.49
33.9%
59.0 65.4 65.9 93.8 68.3 64.8 66.9 61.2 72.1 57.4 88.73s 41679573
input:3470897
output:369572
cacheread:37839104
$9.2045
$0.3827 / $1.72
v3.2.6 2026-01-27 2026-04-02
30
DeepSeekV3.2 Open Dataset deepseek-ai · baiduqianfan
57.94
30.7%
60.3 65.5 66.8 87.4 68.0 67.4 61.5 62.6 71.4 62.7 160.53s 72630744
input:71928542
output:702202
$18.9683
$0.26 / $0.38
v3.2.6 2025-12-01 2026-04-06
31
mimo-v2-pro Open Dataset xiaomi · openrouter
57.92
37.6%
53.2 64.7 65.1 95.9 70.3 64.1 60.1 66.2 66.5 60.3 107.16s 46485163
input:12390088
output:665699
cacheread:33429376
$31.1019
$1 / $3
v3.2.6 2026-03-19 2026-04-01
#23
mimo-v2-pro Closed Dataset xiaomi · xiaomi-token-plan
48.18
26.6%
64.4 62.5 62.5 100 66.1 53.8 71.4 55.5 70.5 53.5 83.57s 15012098
input:3388947
output:201711
cacheread:11421440
$9.7048
$1 / $3 ¥0 / ¥0
v3.2.6 2026-03-19 2026-05-17
32
mimo-v2-omni Open Dataset xiaomi · openrouter
57.65
35.2%
56.6 63.5 64.0 93.1 65.4 65.6 63.5 59.8 68.9 57.2 74.53s 40532601
input:12022794
output:801205
cacheread:27708602
$11.9532
$0.4 / $2
v3.2.6 2026-03-19 2026-04-01
#30
mimo-v2-omni Closed Dataset xiaomi · xiaomi-token-plan
46.98
24.5%
62.0 62.6 62.6 100 65.1 54.9 71.1 55.9 65.9 59.3 81.93s 15024652
input:5571769
output:243027
cacheread:9209856
$4.5567
$0.4 / $2 ¥0 / ¥0
v3.2.6 2026-03-19 2026-05-17
33
LongCat-Flash-Thinking-2601 Open Dataset meituan · longcat
57.48
34.8%
54.6 64.5 65.0 94.9 69.4 63.0 60.2 65.4 70.3 58.3 205.9s 40976389
input:40405158
output:571231
$0
$0 / $0 ¥0 / ¥0
v3.2.6 2026-01-16 2026-04-04
34
Ling-2.6-1T Open Dataset tbox · antling
57.4
36.6%
58.8 60.7 61.7 91.4 70.6 57.1 62.4 57.0 65.7 47.5 69.93s 59026717
input:20384528
output:494895
cacheread:38147294
$13.0747
$0.3 / $2.5 ¥2.0648 / ¥17.2065
v3.2.6 2026-04-24 2026-04-25
35
qwen3.6-max-preview Open Dataset qwen · bailiantokenplan
57.4
36.0%
51.5 65.2 65.8 94.9 75.2 64.5 57.7 61.8 71.7 59.6 132.61s 60826974
input:11239120
output:629190
cacheread:48958664
$86.0724
$2.1794 / $13.0765 ¥15 / ¥90
v3.2.6 2026-04-20 2026-04-24
#17
qwen3.6-max-preview Closed Dataset qwen · bailiantokenplan
51.48
40.0%
66.4 65.1 65.1 100 76.8 50.8 72.5 65.8 55.8 62.8 151.95s 20456157
input:6493368
output:202535
cacheread:13760254
$31.7946
$2.1794 / $13.0765 ¥15 / ¥90
v3.2.6 2026-04-20 2026-05-13
36
kimi-k2.6-code-preview Open Dataset moonshot · moonshot
57.14
32.1%
59.9 62.5 63.1 94.2 67.1 61.4 62.2 61.2 68.1 53.2 71.28s 34448208
input:3235977
output:877255
cacheread:30334976
$20.9923
$0.95 / $4 ¥6.5 / ¥27
v3.2.6 2026-04-21 2026-04-16
37
GLM-5 Open Dataset glm · glm
57.05
28.5%
61.8 64.0 64.5 95.9 69.7 65.5 62.5 59.3 71.9 53.6 122.49s 34977005
input:3229307
output:564274
cacheread:31183424
$14.849
$0.72 / $2.3
v3.2.6 2026-02-11 2026-03-31
38
qwen3.6-35b-a3b Open Dataset qwen · bailiantokenplan
56.94
35.5%
52.2 63.9 64.3 96.2 66.4 64.3 60.1 62.2 73.9 57.0 54.28s 53780281
input:11059276
output:743635
cacheread:41977370
$9.5475
$0.2615 / $1.5692 ¥1.8 / ¥10.8
v3.2.6 2026-04-16 2026-04-26
39
qwen3.6-flash Open Dataset qwen · bailiantokenplan
56.55
35.7%
51.0 63.4 63.9 94.2 69.7 60.5 61.4 61.6 73.0 52.6 60.52s 61455165
input:10973905
output:834629
cacheread:49646631
$28.4575
$0.6974 / $4.1845 ¥4.8 / ¥28.8
v3.2.6 2026-04-17 2026-04-26
40
GLM-4.6 Open Dataset glm · glm
56.29
29.5%
57.0 63.6 64.3 93.4 64.8 65.2 64.8 59.6 69.7 57.0 104.98s 42991923
input:2767151
output:872645
cacheread:39352127
$10.4109
$0.39 / $1.9
v3.2.6 2025-09-30 2026-04-05
41
qwen3-max-2026-01-23 Open Dataset qwen · bailiancodingplan
55.76
36.6%
52.4 59.8 60.2 95.5 72.8 64.1 62.9 57.6 58.7 37.2 220.61s 67262104
input:11956034
output:284269
cacheread:55021801
$41.2943
$1.017 / $4.068 ¥7 / ¥28
v3.2.6 2026-01-23 2026-04-07
42
kat-coder-pro-v2 Open Dataset kwaipilot · openrouter
54.74
31.7%
52.3 60.3 61.3 88.7 63.5 63.0 62.2 60.7 63.3 47.7 107.08s 98806785
input:16915312
output:544641
cacheread:81346832
$17.9302
$0.3 / $1.2
v3.2.6 2026-03-27 2026-04-01
#3
kat-coder-pro-v2 Closed Dataset kwaipilot · StremLake
59.68
50.3%
78.4 70.7 70.7 100 76.4 52.7 76.8 73.5 70.5 70.4 172.4s 16430245
input:4379125
output:111700
cacheread:11939420
$3.2387
$0.3 / $1.2 ¥0 / ¥0
v3.2.6 2026-03-27 2026-05-16
43
GLM-4.7 Open Dataset glm · glm
54.58
26.4%
54.8 62.7 63.3 94.3 65.1 63.9 60.5 63.8 69.2 53.6 124.51s 40323216
input:3276264
output:829547
cacheread:36217405
$9.7918
$0.39 / $1.75
v3.2.6 2025-12-23 2026-04-01
#25
GLM-4.7 Closed Dataset glm · glm
47.96
24.1%
64.3 63.3 63.3 100 70.2 56.7 62.7 54.1 69.9 63.8 165.03s 13103309
input:561065
output:271780
cacheread:12270464
$3.0872
$0.39 / $1.75 ¥0 / ¥0
v3.2.6 2025-12-23 2026-05-16
44
gemini-3.1-pro-preview Open Dataset google · openrouter
53.95
30.1%
54.6 58.1 58.6 95.1 57.4 61.0 60.0 54.2 66.2 49.4 107.66s 49678586
input:14860816
output:717196
cacheread:34100574
$72.4286
$2 / $12
v3.2.6 2026-02-20 2026-04-02
45
hunyuan-2.0-thinking Open Dataset tencent · tencent-token-plan
52.69
28.3%
47.2 60.1 60.9 91.8 57.8 65.4 57.7 59.8 69.5 52.1 162.28s 54383209
input:53285801
output:1097408
$44.4101
$0.77 / $3.08 ¥5.3 / ¥21.2
v3.2.6 2025-12-05 2026-04-04
46
MiniMax-M2.5 Open Dataset minimax · minimax
51.79
23.3%
52.7 58.8 59.4 92.6 57.6 63.1 61.1 60.5 60.8 49.0 96.4s 54177427
input:8780828
output:669815
cacheread:44726784
$4.3381
$0.118 / $0.99
v3.2.6 2026-02-12 2026-04-02
#15
MiniMax-M2.5 Closed Dataset minimax · minimax
51.76
33.8%
67.6 66.8 66.8 100 67.3 63.6 68.9 64.3 67.1 68.7 111.3s 13023376
input:2505661
output:167091
cacheread:10350624
$1.0718
$0.118 / $0.99 ¥0 / ¥0
v3.2.6 2026-02-12 2026-05-14
47
gemma-4-31b-it Open Dataset google · openrouter
51.59
35.0%
41.1 56.1 56.4 95.2 50.8 64.5 59.7 54.5 60.9 46.8 207.49s 33753472
input:25401261
output:211539
cacheread:8140672
$4.2106
$0.14 / $0.4
v3.2.6 2026-04-02 2026-04-11
48
Ling-2.5-1T Open Dataset tbox · antling
51.19
27.8%
50.2 54.6 55.5 89.3 56.0 55.6 60.5 57.8 52.5 42.9 508.52s 63436733
input:8824540
output:336271
cacheread:54275922
$21.676
$0.581 / $2.325 ¥4 / ¥16
v3.2.6 2026-02-16 2026-04-07
49
DeepSeek-R1 Open Dataset deepseek · siliconflow
50.23
20.0%
52.5 57.8 57.8 99.6 52.0 61.9 55.2 57.3 67.8 55.2 702.25s 25698873
input:21718776
output:3980097
$21.8755
$0.5812 / $2.3247 ¥4 / ¥16
v3.2.6 2025-05-28 2026-05-01
50
MiniMax-M2.7 Open Dataset minimax · minimax
49.53
20.2%
50.0 56.8 57.3 94.8 59.0 60.0 52.3 55.4 63.6 51.5 134.51s 70440558
input:8033465
output:678335
cacheread:59133252
cachewrite:2595506
$12.4834
$0.3 / $1.2
v3.2.6 2026-03-18 2026-04-02
#26
MiniMax-M2.7 Closed Dataset minimax · minimax
47.89
26.8%
64.8 61.0 61.0 100 65.6 48.9 69.7 48.3 66.8 62.0 186.75s 12810130
input:382441
output:192932
cacheread:10515969
cachewrite:1718788
$2.1815
$0.3 / $1.2 ¥0 / ¥0
v3.2.6 2026-03-18 2026-05-16
51
kimi-for-coding-k2.6 Open Dataset moonshot · kimicodingplan
49.04
22.8%
44.9 55.7 64.3 46.5 58.7 51.4 58.2 53.0 62.4 48.9 268.85s 194685491
input:153520279
output:465180
cacheread:40700032
$67.3403
$0.3827 / $1.72
v3.2.6 2025-10-24 2026-05-01
52
gemini-3-flash-preview Open Dataset google · openrouter
48.99
22.4%
48.9 53.8 54.3 93.0 54.2 63.0 37.8 57.1 61.9 53.7 110.14s 109050773
input:26076719
output:244800
cacheread:82729254
$34.4551
$0.5 / $3
v3.2.6 2025-12-17 2026-04-05
53
MiniMax-M2.1 Open Dataset minimax · minimax
48.08
16.4%
50.5 56.8 57.3 93.1 58.8 59.0 56.0 56.4 60.6 49.3 92.82s 49828422
input:9501688
output:677502
cacheread:39649232
$8.5617
$0.27 / $0.95
v3.2.6 2025-12-23 2026-04-01
#12
MiniMax-M2.1 Closed Dataset minimax · minimax
52.61
31.1%
70.6 66.7 66.7 100 70.7 61.0 69.8 63.1 65.1 68.1 112.2s 12776800
input:2494084
output:166342
cacheread:10096756
cachewrite:19618
$2.1971
$0.27 / $0.95 ¥0 / ¥0
v3.2.6 2025-12-23 2026-05-14
54
Kimi-K2-Thinking Open Dataset moonshot · siliconflow
47.83
18.0%
48.5 55.2 56.7 88.2 55.9 54.4 53.0 54.2 62.3 51.6 417.5s 36332107
input:35311598
output:1020509
$23.7382
$0.6 / $2.5
v3.2.6 2025-11-06 2026-04-09
55
hunyuan-2.0-instruct Open Dataset tencent · tencent-token-plan
46.94
15.9%
45.1 57.1 58.1 90.8 57.8 62.1 56.3 55.6 61.9 48.9 122.05s 63879796
input:63382367
output:497429
$41.9994
$0.65 / $1.61 ¥4.5 / ¥11.1
v3.2.6 2025-12-05 2026-04-05
56
qwen3-coder-next Open Dataset qwen · bailiancodingplan
46.84
19.4%
41.8 54.6 55.7 87.7 56.4 59.3 56.5 52.3 57.7 43.9 71.83s 104722641
input:12293170
output:381362
cacheread:92048109
$21.7349
$0.3632 / $1.4529 ¥2.5 / ¥10
v3.2.6 2026-02-20 2026-04-08
57
mistral-small-2603 Open Dataset mistralai · openrouter
45.26
14.2%
48.9 52.4 53.0 90.7 50.2 57.0 48.0 51.7 58.6 50.8 109.15s 43699451
input:18648766
output:1078701
cacheread:23971984
$5.2424
$0.15 / $0.6
v3.2.6 2026-03-16 2026-04-06
58
grok-4.20 Open Dataset x-ai · openrouter
43.04
14.7%
42.1 48.8 49.6 92.3 60.2 44.7 43.4 48.7 50.4 43.5 80.19s 51133010
input:7668306
output:197632
cacheread:43267072
$59.7895
$2 / $6
v3.2.6 2026-03-12 2026-04-04
59
kimi-for-coding-k2.5 Open Dataset moonshot · kimicodingplan
42.72
12.9%
39.0 52.1 63.4 32.3 57.8 55.1 47.9 50.0 55.8 45.3 249.95s 360378148
input:13150894
output:2602239
cacheread:344625015
$75.4527
$0.3827 / $1.72
v3.2.6 2025-10-24 2026-04-11
60
step-3.5-flash-2603 Open Dataset stepfun · stepfun
42.59
15.3%
32.9 52.3 53.6 86.9 50.1 56.9 55.4 50.9 58.4 41.9 107.1s 86247300
input:85197273
output:1050027
$8.8452
$0.1 / $0.31 ¥0.7 / ¥2.1
v3.2.6 2026-04-02 2026-04-02
#22
step-3.5-flash-2603 Closed Dataset stepfun · stepfun
49.19
17.8%
71.5 62.2 62.2 100 70.7 49.7 65.4 56.0 62.6 65.0 72.65s 13317837
input:2946705
output:261948
cacheread:10109184
$0.8813
$0.1 / $0.31 ¥0.7 / ¥2.1
v3.2.6 2026-04-02 2026-05-13
61
step-3.5-flash Open Dataset stepfun · stepfun
41.75
14.7%
31.1 51.8 53.0 86.1 50.1 58.8 54.3 49.8 57.3 40.0 103.62s 87855843
input:86724853
output:1130990
$9.0118
$0.1 / $0.3
v3.2.6 2026-02-02 2026-04-04
#27
step-3.5-flash Closed Dataset stepfun · stepfun
47.49
20.1%
65.0 63.1 63.1 100 68.5 51.4 65.9 61.9 68.8 59.4 97.7s 13508242
input:3178065
output:267201
cacheread:10062976
$0.9011
$0.1 / $0.3 ¥0 / ¥0
v3.2.6 2026-02-02 2026-05-12
62
Spark X2 Open Dataset xunfei · astroncodingplan
41.44
12.7%
39.3 48.4 49.0 90.9 46.6 55.6 50.1 45.3 53.6 39.4 242.33s 32609723
input:31554432
output:1055291
$14.3483
$0.44 / $0.44 ¥3 / ¥3
v3.2.6 2026-02-11 2026-04-07
#34
Spark X2 Closed Dataset xunfei · astroncodingplan
44.83
31.8%
55.2 60.6 60.6 100 53.0 52.4 71.5 56.9 73.0 56.0 157.6s 13212397
input:12780019
output:432378
$5.8135
$0.44 / $0.44 ¥3 / ¥3
v3.2.6 2026-02-11 2026-05-16
63
step-3.5-flash Open Dataset stepfun · openrouter
38.74
9.5%
35.3 47.9 48.7 91.2 42.9 53.7 54.7 39.4 55.6 40.2 82.69s 63192773
input:13974967
output:1641102
cacheread:47576704
$4.2687
$0.1 / $0.3
v3.2.6 2026-02-02 2026-04-01
64
hunyuan-t1 Open Dataset tencent · tencent-token-plan
34.74
9.2%
26.6 41.7 42.1 94.0 41.0 43.2 35.4 36.2 47.6 48.9 100.87s 36613811
input:35477732
output:1136079
$5.8152
$0.1453 / $0.5812 ¥1 / ¥4
v3.2.6 2025-03-21 2026-04-24
65
ERNIE-4.5-Turbo Open Dataset baidu · baiduqianfan
33.68
7.1%
26.0 42.9 43.0 97.5 29.1 52.1 44.0 40.8 54.6 41.0 115.4s 32527395
input:9183126
output:467510
cacheread:22876759
$2.6896
$0.12 / $0.46 ¥0.8 / ¥3.2
v3.2.6 2026-04-02 2026-04-05
#28
ERNIE-4.5-Turbo Closed Dataset baidu · baiduqianfancodingplan
47.25
29.5%
61.6 61.6 61.6 100 65.3 58.9 60.3 68.8 51.1 64.0 91.36s 8680055
input:519490
output:102583
cacheread:8057982
$0.593
$0.12 / $0.46 ¥0.8 / ¥3.2
v3.2.6 2026-04-02 2026-05-17
66
Ling-2.6-Flash Open Dataset tbox · antling
27.04
2.4%
28.9 35.8 36.0 99.0 37.4 43.0 29.2 31.0 44.7 30.9 54.98s 26416550
input:12470152
output:114023
cacheread:13832375
$1.9728
$0.1 / $0.3 ¥0.6 / ¥1.8
v3.2.6 2026-04-23 2026-04-23
#1
Qwen3.7-Max Closed Dataset qwen · bailiantokenplan
62.75
47.2%
86.1 69.2 69.2 100.0 79.9 48.9 70.6 62.3 77.9 71.2 87.4s 23896173
input:12596043
output:228187
cacheread:11071943
$0.0
$0 / $0 ¥0 / ¥0
v3.2.6 2026-05-23 2026-05-23
Notes: USD/CNY quick reference uses approximately 1($)≈ 6.8826(¥). cache_read_tokens and cache_write_tokens are billed at 0.5x the input-token price. cost_usd = (input_tokens + 0.5 * (cache_read_tokens + cache_write_tokens)) * price_usd_input / 1e6 + output_tokens * price_usd_output / 1e6 mimo-v2.5-pro and deepseek-v4-pro currently use 1M-context list pricing, so their displayed Cost is slightly overstated. Closed dataset detail pages show model-level summaries only; they intentionally omit per-task scores and task-level rows. pass^3 = weighted_pass_at_k_all, while strict_pass_rate = unweighted_pass_at_k_all. Open Final Score = 100 × AvgScore^0.40 × ((Pass^3)^(1/3))^0.45 × (1 - (1 - Pass@3)^(1/3))^0.15 Closed Final Score = 100 × AvgScore^0.40 × ((Pass^3)^(1/3))^0.25 × (1 - (1 - Pass@3)^(1/3))^0.35

Visual Leaderboard

View the full leaderboard in chart form. When you switch metrics, the bar ranking below re-sorts by the selected field.

Dataset
intern-s2-preview ShangHai AILab · intern Sensenova 6.7 Flash Lite sensetime · sensenova ERNIE 5.1 baidu · baiduqianfan gpt-5.5-xhigh openai · openai gpt-5.4-xhigh openai · openai gpt-5.3-codex-xhigh openai · openai deepseek-v4-pro deepseek · deepseek qwen3.5-plus qwen · bailiancodingplan qwen3.5-397b-a17b qwen · bailiantokenplan mimo-v2.5-pro xiaomi · xiaomi-token-plan GLM-5.1 glm · glm doubao-seed-2.0-code seed · volcengine-plan Ring 2.6 1T inclusionai · openrouter GLM-5-Turbo glm · glm deepseek-v4-flash deepseek · deepseek doubao-seed-2.0-pro seed · volcengine-plan Claude Sonnet 4.6 anthropic · openrouter doubao-seed-2.0-lite seed · volcengine-plan mimo-v2.5 xiaomi · xiaomi-token-plan qwen3.6-plus qwen · bailiantokenplan DeepSeek-V3.2 deepseek-ai · siliconflow DeepSeek-V3.2 deepseek-ai · volcengine-plan huanyuan-3.0-preview tencent · tencent-token-plan kimi-k2.6 moonshot · volcengine-plan doubao-seed-code seed · volcengine-plan qwen3.6-plus qwen · bailianapi LongCat-2.0-Preview meituan · longcat qwen3.6-27b qwen · bailiantokenplan kimi-k2.5 moonshot · moonshot DeepSeekV3.2 deepseek-ai · baiduqianfan mimo-v2-pro xiaomi · openrouter mimo-v2-omni xiaomi · openrouter LongCat-Flash-Thinking-2601 meituan · longcat Ling-2.6-1T tbox · antling qwen3.6-max-preview qwen · bailiantokenplan kimi-k2.6-code-preview moonshot · moonshot GLM-5 glm · glm qwen3.6-35b-a3b qwen · bailiantokenplan qwen3.6-flash qwen · bailiantokenplan GLM-4.6 glm · glm qwen3-max-2026-01-23 qwen · bailiancodingplan kat-coder-pro-v2 kwaipilot · openrouter GLM-4.7 glm · glm gemini-3.1-pro-preview google · openrouter hunyuan-2.0-thinking tencent · tencent-token-plan MiniMax-M2.5 minimax · minimax gemma-4-31b-it google · openrouter Ling-2.5-1T tbox · antling DeepSeek-R1 deepseek · siliconflow MiniMax-M2.7 minimax · minimax kimi-for-coding-k2.6 moonshot · kimicodingplan gemini-3-flash-preview google · openrouter MiniMax-M2.1 minimax · minimax Kimi-K2-Thinking moonshot · siliconflow hunyuan-2.0-instruct tencent · tencent-token-plan qwen3-coder-next qwen · bailiancodingplan mistral-small-2603 mistralai · openrouter grok-4.20 x-ai · openrouter kimi-for-coding-k2.5 moonshot · kimicodingplan step-3.5-flash-2603 stepfun · stepfun step-3.5-flash stepfun · stepfun Spark X2 xunfei · astroncodingplan step-3.5-flash stepfun · openrouter hunyuan-t1 tencent · tencent-token-plan ERNIE-4.5-Turbo baidu · baiduqianfan Ling-2.6-Flash tbox · antling Qwen3.7-Max qwen · bailiantokenplan doubao-seed-2.0-code seed · volcengine-plan kat-coder-pro-v2 kwaipilot · StremLake gpt-5.3-codex-xhigh openai · openai qwen3.6-plus qwen · bailiantokenplan deepseek-v4-pro deepseek · deepseek gpt-5.5-xhigh openai · openai kimi-k2.6 moonshot · volcengine-plan DeepSeek-V3.2 deepseek-ai · volcengine-plan gpt-5.4-xhigh openai · openai GLM-5.1 glm · glm MiniMax-M2.1 minimax · minimax qwen3.5-plus qwen · bailiancodingplan GLM-5-Turbo glm · glm MiniMax-M2.5 minimax · minimax mimo-v2.5-pro xiaomi · xiaomi-token-plan qwen3.6-max-preview qwen · bailiantokenplan doubao-seed-2.0-pro seed · volcengine-plan LongCat-2.0-Preview meituan · longcat Ring 2.6 1T inclusionai · openrouter huanyuan-3.0-preview tencent · tencent-token-plan step-3.5-flash-2603 stepfun · stepfun mimo-v2-pro xiaomi · xiaomi-token-plan mimo-v2.5 xiaomi · xiaomi-token-plan GLM-4.7 glm · glm MiniMax-M2.7 minimax · minimax step-3.5-flash stepfun · stepfun ERNIE-4.5-Turbo baidu · baiduqianfancodingplan intern-s2-preview ShangHai AILab · intern mimo-v2-omni xiaomi · xiaomi-token-plan deepseek-v4-flash deepseek · deepseek ERNIE 5.1 baidu · baiduqianfan Sensenova 6.7 Flash Lite sensetime · sensenova Spark X2 xunfei · astroncodingplan

Score vs Runtime

Compare model scores against average runtime. The x-axis is Avg Runtime, and the y-axis can be switched across score fields.

Dataset
Final Score
intern-s2-preview Sensenova 6.7 Flash Lite ERNIE 5.1 gpt-5.5-xhigh gpt-5.4-xhigh gpt-5.3-codex-xhigh deepseek-v4-pro qwen3.5-plus qwen3.5-397b-a17b mimo-v2.5-pro GLM-5.1 doubao-seed-2.0-code Ring 2.6 1T GLM-5-Turbo deepseek-v4-flash doubao-seed-2.0-pro Claude Sonnet 4.6 doubao-seed-2.0-lite mimo-v2.5 qwen3.6-plus DeepSeek-V3.2 DeepSeek-V3.2 huanyuan-3.0-preview kimi-k2.6 doubao-seed-code qwen3.6-plus LongCat-2.0-Preview qwen3.6-27b kimi-k2.5 DeepSeekV3.2 mimo-v2-pro mimo-v2-omni LongCat-Flash-Thinking-2601 Ling-2.6-1T qwen3.6-max-preview kimi-k2.6-code-preview GLM-5 qwen3.6-35b-a3b qwen3.6-flash GLM-4.6 qwen3-max-2026-01-23 kat-coder-pro-v2 GLM-4.7 gemini-3.1-pro-preview hunyuan-2.0-thinking MiniMax-M2.5 gemma-4-31b-it Ling-2.5-1T DeepSeek-R1 MiniMax-M2.7 kimi-for-coding-k2.6 gemini-3-flash-preview MiniMax-M2.1 Kimi-K2-Thinking hunyuan-2.0-instruct qwen3-coder-next mistral-small-2603 grok-4.20 kimi-for-coding-k2.5 step-3.5-flash-2603 step-3.5-flash Spark X2 step-3.5-flash hunyuan-t1 ERNIE-4.5-Turbo Ling-2.6-Flash Qwen3.7-Max doubao-seed-2.0-code kat-coder-pro-v2 gpt-5.3-codex-xhigh qwen3.6-plus deepseek-v4-pro gpt-5.5-xhigh kimi-k2.6 DeepSeek-V3.2 gpt-5.4-xhigh GLM-5.1 MiniMax-M2.1 qwen3.5-plus GLM-5-Turbo MiniMax-M2.5 mimo-v2.5-pro qwen3.6-max-preview doubao-seed-2.0-pro LongCat-2.0-Preview Ring 2.6 1T huanyuan-3.0-preview step-3.5-flash-2603 mimo-v2-pro mimo-v2.5 GLM-4.7 MiniMax-M2.7 step-3.5-flash ERNIE-4.5-Turbo intern-s2-preview mimo-v2-omni deepseek-v4-flash ERNIE 5.1 Sensenova 6.7 Flash Lite Spark X2
Avg. Seconds per Task

Score vs Cost

Compare model scores against benchmark cost. The x-axis is Cost, and the y-axis can be switched across score fields.

Dataset
Final Score
intern-s2-preview Sensenova 6.7 Flash Lite ERNIE 5.1 gpt-5.5-xhigh gpt-5.4-xhigh gpt-5.3-codex-xhigh deepseek-v4-pro qwen3.5-plus qwen3.5-397b-a17b mimo-v2.5-pro GLM-5.1 doubao-seed-2.0-code Ring 2.6 1T GLM-5-Turbo deepseek-v4-flash doubao-seed-2.0-pro Claude Sonnet 4.6 doubao-seed-2.0-lite mimo-v2.5 qwen3.6-plus DeepSeek-V3.2 DeepSeek-V3.2 huanyuan-3.0-preview kimi-k2.6 doubao-seed-code qwen3.6-plus LongCat-2.0-Preview qwen3.6-27b kimi-k2.5 DeepSeekV3.2 mimo-v2-pro mimo-v2-omni LongCat-Flash-Thinking-2601 Ling-2.6-1T qwen3.6-max-preview kimi-k2.6-code-preview GLM-5 qwen3.6-35b-a3b qwen3.6-flash GLM-4.6 qwen3-max-2026-01-23 kat-coder-pro-v2 GLM-4.7 gemini-3.1-pro-preview hunyuan-2.0-thinking MiniMax-M2.5 gemma-4-31b-it Ling-2.5-1T DeepSeek-R1 MiniMax-M2.7 kimi-for-coding-k2.6 gemini-3-flash-preview MiniMax-M2.1 Kimi-K2-Thinking hunyuan-2.0-instruct qwen3-coder-next mistral-small-2603 grok-4.20 kimi-for-coding-k2.5 step-3.5-flash-2603 step-3.5-flash Spark X2 step-3.5-flash hunyuan-t1 ERNIE-4.5-Turbo Ling-2.6-Flash Qwen3.7-Max doubao-seed-2.0-code kat-coder-pro-v2 gpt-5.3-codex-xhigh qwen3.6-plus deepseek-v4-pro gpt-5.5-xhigh kimi-k2.6 DeepSeek-V3.2 gpt-5.4-xhigh GLM-5.1 MiniMax-M2.1 qwen3.5-plus GLM-5-Turbo MiniMax-M2.5 mimo-v2.5-pro qwen3.6-max-preview doubao-seed-2.0-pro LongCat-2.0-Preview Ring 2.6 1T huanyuan-3.0-preview step-3.5-flash-2603 mimo-v2-pro mimo-v2.5 GLM-4.7 MiniMax-M2.7 step-3.5-flash ERNIE-4.5-Turbo intern-s2-preview mimo-v2-omni deepseek-v4-flash ERNIE 5.1 Sensenova 6.7 Flash Lite Spark X2
Cost (USD)