A benchmark for OpenClaw agents

OpenClawProBench

A benchmark designed to measure how different models perform under OpenClaw when they need to reason, plan, use tools, and stay reliable across repeated runs.

Benchmark Leaderboard

Pass^3 remains the default rank signal. The table below shows score quality, efficiency, all six evaluation dimensions, average runtime, total token usage, token breakdown, OpenClaw version, and update time.

Browse Tasks →
# Model Pass^3 Pass@3 Avg Score Capability Efficiency Planning Safety Tool Use Constraints Error Recovery Synthesis Avg Runtime Total Token Token Cost Price OpenClaw Released Updated
1
qwen3.5-plus qwen · bailiancodingplan
49.6%
61.2 70.1 70.9 91.1 76.5 67.2 70.7 69.6 75.0 59.4 123.72s 74256276
input:10347477
output:534393
cacheread:63374406
$11.7627
$0.26 / $1.56
v3.2.6 2026-02-15 2026-04-06
2
GLM-5.1 glm · glm
44.9%
61.6 69.0 69.1 98.8 75.2 68.4 71.0 65.8 74.1 56.7 89.33s 47621722
input:2299534
output:356684
cacheread:44965504
$25.9237
$1 / $3.2 ¥6.88 / ¥22.02
v3.2.6 2026-03-27 2026-03-30
3
doubao-seed-2.0-pro seed · volcengine-plan
41.6%
57.5 68.3 68.6 97.4 70.1 68.3 73.1 65.1 73.8 57.4 90.89s 20474370
input:3405226
output:188658
cacheread:16880486
$17.8378
$1.3948 / $6.9741 ¥9.6 / ¥48.0
v3.2.6 2026-02-14 2026-04-04
4
doubao-seed-2.0-code seed · volcengine-plan
43.2%
62.9 67.7 68.6 90.3 70.3 67.5 69.1 62.9 74.6 60.7 149.96s 33552869
input:3918340
output:191987
cacheread:29442542
$27.3375
$1.3948 / $6.9741 ¥9.6 / ¥48.0
v3.2.6 2026-02-14 2026-04-04
5
DeepSeek-V3.2 deepseek-ai · siliconflow
39.0%
57.0 67.6 68.4 90.6 68.7 65.8 68.9 65.3 73.3 62.8 160.91s 65468885
input:10004682
output:637259
cacheread:54826944
$9.9709
$0.26 / $0.38
v3.2.6 2025-12-01 2026-04-03
6
GLM-5-Turbo glm · glm
46.9%
55.0 67.1 67.4 96.1 72.2 66.6 67.4 65.3 70.5 58.6 80.31s 27059691
input:2273484
output:326612
cacheread:24459595
$18.7104
$1.2 / $4
v3.2.6 2026-03-16 2026-04-04
7
DeepSeek-V3.2 deepseek-ai · volcengine-plan
36.9%
60.8 66.7 68.0 86.9 68.8 67.1 64.2 65.4 72.5 62.5 201.7s 97734852
input:10027746
output:545655
cacheread:87161451
$14.1456
$0.26 / $0.38
v3.2.6 2025-12-01 2026-04-06
8
Claude Sonnet 4.6 anthropic · openrouter
45.5%
53.9 66.6 67.5 90.9 71.6 65.0 64.0 66.8 71.4 60.2 178.58s 29132003
input:28539688
output:592315
$94.5038
$3 / $15
v3.2.6 2026-02-18 2026-04-06
9
DeepSeekV3.2 deepseek-ai · baiduqianfan
30.7%
60.3 65.5 66.8 87.4 68.0 67.4 61.5 62.6 71.4 62.7 160.53s 72630744
input:71928542
output:702202
$18.9683
$0.26 / $0.38
v3.2.6 2025-12-01 2026-04-06
10
kimi-k2.5 moonshot · moonshot
33.9%
59.0 65.4 65.9 93.8 68.3 64.8 66.9 61.2 72.1 57.4 88.73s 41679573
input:3470897
output:369572
cacheread:37839104
$9.2045
$0.3827 / $1.72
v3.2.6 2026-01-27 2026-04-02
11
qwen3.6-plus qwen · bailianapi
39.0%
55.9 65.3 66.0 91.2 71.2 63.3 64.1 66.7 68.4 56.2 120.83s 62909389
input:10724289
output:627594
cacheread:51557506
$46.7179
$1.16 / $6.97 ¥8 / ¥48
v3.2.6 2026-04-02 2026-04-04
12
mimo-v2-pro xiaomi · openrouter
37.6%
53.2 64.7 65.1 95.9 70.3 64.1 60.1 66.2 66.5 60.3 107.16s 46485163
input:12390088
output:665699
cacheread:33429376
$31.1019
$1 / $3
v3.2.6 2026-03-19 2026-04-01
13
LongCat-Flash-Thinking-2601 meituan · longcat
34.8%
54.6 64.5 65.0 94.9 69.4 63.0 60.2 65.4 70.3 58.3 205.9s 40976389
input:40405158
output:571231
$0
$0 / $0 ¥0 / ¥0
v3.2.6 2026-01-16 2026-04-04
14
GLM-5 glm · glm
28.5%
61.8 64.0 64.5 95.9 69.7 65.5 62.5 59.3 71.9 53.6 122.49s 34977005
input:3229307
output:564274
cacheread:31183424
$14.849
$0.72 / $2.3
v3.2.6 2026-02-11 2026-03-31
15
GLM-4.6 glm · glm
29.5%
57.0 63.6 64.3 93.4 64.8 65.2 64.8 59.6 69.7 57.0 104.98s 42991923
input:2767151
output:872645
cacheread:39352127
$10.4109
$0.39 / $1.9
v3.2.6 2025-09-30 2026-04-05
16
mimo-v2-omni xiaomi · openrouter
35.2%
56.6 63.5 64.0 93.1 65.4 65.6 63.5 59.8 68.9 57.2 74.53s 40532601
input:12022794
output:801205
cacheread:27708602
$11.9532
$0.4 / $2
v3.2.6 2026-03-19 2026-04-01
17
gpt-5.4 openai · openrouter
32.8%
55.1 63.4 64.9 84.6 66.6 63.2 64.3 59.1 66.3 59.8 125.1s 48417591
input:36403992
output:507295
cacheread:11506304
$113.0023
$2.5 / $15
v3.2.6 2026-03-05 2026-04-02
18
GLM-4.7 glm · glm
26.4%
54.8 62.7 63.3 94.3 65.1 63.9 60.5 63.8 69.2 53.6 124.51s 40323216
input:3276264
output:829547
cacheread:36217405
$9.7918
$0.39 / $1.75
v3.2.6 2025-12-23 2026-04-01
19
kat-coder-pro-v2 kwaipilot · openrouter
31.7%
52.3 60.3 61.3 88.7 63.5 63.0 62.2 60.7 63.3 47.7 107.08s 98806785
input:16915312
output:544641
cacheread:81346832
$17.9302
$0.3 / $1.2
v3.2.6 2026-03-27 2026-04-01
20
hunyuan-2.0-thinking tencent · tencent-token-plan
28.3%
47.2 60.1 60.9 91.8 57.8 65.4 57.7 59.8 69.5 52.1 162.28s 54383209
input:53285801
output:1097408
$44.4101
$0.77 / $3.08 ¥5.3 / ¥21.2
v3.2.6 2025-12-05 2026-04-04
21
qwen3-max-2026-01-23 qwen · bailiancodingplan
36.6%
57.8 59.8 60.2 95.5 72.8 64.1 62.9 57.6 58.7 37.2 220.61s 67262104
input:11956034
output:284269
cacheread:55021801
$41.2943
$1.017 / $4.068 ¥7 / ¥28
v3.2.6 2026-01-23 2026-04-07
22
MiniMax-M2.5 minimax · minimax
23.3%
52.7 58.8 59.4 92.6 57.6 63.1 61.1 60.5 60.8 49.0 96.4s 54177427
input:8780828
output:669815
cacheread:44726784
$4.3381
$0.118 / $0.99
v3.2.6 2026-02-12 2026-04-02
23
gemini-3.1-pro-preview google · openrouter
30.1%
54.6 58.1 58.6 95.1 57.4 61.0 60.0 54.2 66.2 49.4 107.66s 49678586
input:14860816
output:717196
cacheread:34100574
$72.4286
$2 / $12
v3.2.6 2026-02-20 2026-04-02
24
hunyuan-2.0-instruct tencent · tencent-token-plan
15.9%
45.1 57.1 58.1 90.8 57.8 62.1 56.3 55.6 61.9 48.9 122.05s 63879796
input:63382367
output:497429
$41.9994
$0.65 / $1.61 ¥4.5 / ¥11.1
v3.2.6 2025-12-05 2026-04-05
25
MiniMax-M2.7 minimax · minimax
20.2%
50.0 56.8 57.3 94.8 59.0 60.0 52.3 55.4 63.6 51.5 134.51s 70440558
input:8033465
output:678335
cacheread:59133252
cachewrite:2595506
$12.4834
$0.3 / $1.2
v3.2.6 2026-03-18 2026-04-02
26
MiniMax-M2.1 minimax · minimax
16.4%
50.5 56.8 57.3 93.1 58.8 59.0 56.0 56.4 60.6 49.3 92.82s 49828422
input:9501688
output:677502
cacheread:39649232
$8.5617
$0.27 / $0.95
v3.2.6 2025-12-23 2026-04-01
27
Ling-2.5-1T tbox · antling
27.8%
50.2 54.6 55.5 89.3 56.0 55.6 60.5 57.8 52.5 42.9 508.52s 63436733
input:8824540
output:336271
cacheread:54275922
$21.676
$0.581 / $2.325 ¥4 / ¥16
v3.2.6 2026-02-16 2026-04-07
28
qwen3-coder-next qwen · bailiancodingplan
19.4%
41.8 54.6 55.7 87.7 56.4 59.3 56.5 52.3 57.7 43.9 71.83s 104722641
input:12293170
output:381362
cacheread:92048109
$21.7349
$0.3632 / $1.4529 ¥2.5 / ¥10
v3.2.6 2026-02-20 2026-04-08
29
gemini-3-flash-preview google · openrouter
22.4%
48.9 53.8 54.3 93.0 54.2 63.0 37.8 57.1 61.9 53.7 110.14s 109050773
input:26076719
output:244800
cacheread:82729254
$34.4551
$0.5 / $3
v3.2.6 2025-12-17 2026-04-05
30
mistral-small-2603 mistralai · openrouter
14.2%
48.9 52.4 53.0 90.7 50.2 57.0 48.0 51.7 58.6 50.8 109.15s 43699451
input:18648766
output:1078701
cacheread:23971984
$5.2424
$0.15 / $0.6
v3.2.6 2026-03-16 2026-04-06
31
step-3.5-flash-2603 stepfun · stepfun
15.3%
32.9 52.3 53.6 86.9 50.1 56.9 55.4 50.9 58.4 41.9 107.1s 86247300
input:85197273
output:1050027
$8.8452
$0.1 / $0.31 ¥0.7 / ¥2.1
v3.2.6 2026-04-02 2026-04-02
32
step-3.5-flash stepfun · stepfun
14.7%
31.1 51.8 53.0 86.1 50.1 58.8 54.3 49.8 57.3 40.0 103.62s 87855843
input:86724853
output:1130990
$9.0118
$0.1 / $0.3
v3.2.6 2026-02-02 2026-04-04
33
grok-4.20 x-ai · openrouter
14.7%
42.1 48.8 49.6 92.3 60.2 44.7 43.4 48.7 50.4 43.5 80.19s 51133010
input:7668306
output:197632
cacheread:43267072
$59.7895
$2 / $6
v3.2.6 2026-03-12 2026-04-04
34
Spark X2 xunfei · astroncodingplan
14.7%
41.2 48.4 49.0 89.9 46.6 55.6 50.1 45.3 53.6 39.4 242.33s 32609723
input:31554432
output:1055291
$14.3483
$0.44 / $0.44 ¥3 / ¥3
v3.2.6 2026-02-11 2026-04-07
35
step-3.5-flash stepfun · openrouter
9.5%
35.3 47.9 48.7 91.2 42.9 53.7 54.7 39.4 55.6 40.2 82.69s 63192773
input:13974967
output:1641102
cacheread:47576704
$4.2687
$0.1 / $0.3
v3.2.6 2026-02-02 2026-04-01
36
ERNIE-4.5-Turbo baidu · baiduqianfan
7.1%
26.0 42.9 43.0 96.6 29.1 52.1 44.0 40.8 54.6 41.0 115.4s 32527395
input:9183126
output:467510
cacheread:22876759
$2.6896
$0.12 / $0.46 ¥0.8 / ¥3.2
v3.2.6 2026-04-02 2026-04-05
11
qwen3.6-plus qwen · bailiancodingplan
29.4%
50.3 61.7 62.4 92.0 64.4 63.8 60.2 60.7 68.9 51.9 104.92s 66378687
input:11027658
output:621286
cacheread:54729743
$48.9604
$1.1624 / $6.9741 ¥8 / ¥48
v3.2.6 2026-04-02 2026-04-09
6
doubao-seed-code seed · volcengine-plan
40.4%
62.9 65.0 65.8 93.4 63.6 67.2 68.9 59.8 72.7 56.7 105.51s 79743715
input:10387510
output:569004
cacheread:68787201
$19.5397
$0.4068 / $2.3247 ¥2.8 / ¥16.0
v3.2.6 2026-02-14 2026-04-09
Notes: USD/CNY quick reference uses approximately 1($)≈ 6.8826(¥). cache_read_tokens and cache_write_tokens are billed at 0.5x the input-token price. cost_usd = (input_tokens + 0.5 * (cache_read_tokens + cache_write_tokens)) * price_usd_input / 1e6 + output_tokens * price_usd_output / 1e6

Visual Leaderboard

View the full leaderboard in chart form. When you switch metrics, the bar ranking below re-sorts by the selected field.