A benchmark for OpenClaw agents
OpenClawProBench
A benchmark designed to measure how different models perform under OpenClaw when they need to reason, plan, use tools, and stay reliable across repeated runs.
Benchmark Leaderboard
Pass^3 remains the default rank signal. The table below shows score quality, efficiency, all six evaluation dimensions, average runtime, total token usage, token breakdown, OpenClaw version, and update time.
| # | Model | Pass^3 | Pass@3 | Avg Score | Capability | Efficiency | Planning | Safety | Tool Use | Constraints | Error Recovery | Synthesis | Avg Runtime | Total Token | Token | Cost | Price | OpenClaw | Released | Updated |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 |
|
49.6%
|
61.2 | 70.1 | 70.9 | 91.1 | 76.5 | 67.2 | 70.7 | 69.6 | 75.0 | 59.4 | 123.72s | 74256276 |
input:10347477
output:534393
cacheread:63374406
|
$11.7627 |
$0.26 / $1.56
|
v3.2.6 | 2026-02-15 | 2026-04-06 |
| 2 |
|
44.9%
|
61.6 | 69.0 | 69.1 | 98.8 | 75.2 | 68.4 | 71.0 | 65.8 | 74.1 | 56.7 | 89.33s | 47621722 |
input:2299534
output:356684
cacheread:44965504
|
$25.9237 |
$1 / $3.2
¥6.88 / ¥22.02
|
v3.2.6 | 2026-03-27 | 2026-03-30 |
| 3 |
|
41.6%
|
57.5 | 68.3 | 68.6 | 97.4 | 70.1 | 68.3 | 73.1 | 65.1 | 73.8 | 57.4 | 90.89s | 20474370 |
input:3405226
output:188658
cacheread:16880486
|
$17.8378 |
$1.3948 / $6.9741
¥9.6 / ¥48.0
|
v3.2.6 | 2026-02-14 | 2026-04-04 |
| 4 |
|
43.2%
|
62.9 | 67.7 | 68.6 | 90.3 | 70.3 | 67.5 | 69.1 | 62.9 | 74.6 | 60.7 | 149.96s | 33552869 |
input:3918340
output:191987
cacheread:29442542
|
$27.3375 |
$1.3948 / $6.9741
¥9.6 / ¥48.0
|
v3.2.6 | 2026-02-14 | 2026-04-04 |
| 5 |
|
39.0%
|
57.0 | 67.6 | 68.4 | 90.6 | 68.7 | 65.8 | 68.9 | 65.3 | 73.3 | 62.8 | 160.91s | 65468885 |
input:10004682
output:637259
cacheread:54826944
|
$9.9709 |
$0.26 / $0.38
|
v3.2.6 | 2025-12-01 | 2026-04-03 |
| 6 |
|
46.9%
|
55.0 | 67.1 | 67.4 | 96.1 | 72.2 | 66.6 | 67.4 | 65.3 | 70.5 | 58.6 | 80.31s | 27059691 |
input:2273484
output:326612
cacheread:24459595
|
$18.7104 |
$1.2 / $4
|
v3.2.6 | 2026-03-16 | 2026-04-04 |
| 7 |
|
36.9%
|
60.8 | 66.7 | 68.0 | 86.9 | 68.8 | 67.1 | 64.2 | 65.4 | 72.5 | 62.5 | 201.7s | 97734852 |
input:10027746
output:545655
cacheread:87161451
|
$14.1456 |
$0.26 / $0.38
|
v3.2.6 | 2025-12-01 | 2026-04-06 |
| 8 |
|
45.5%
|
53.9 | 66.6 | 67.5 | 90.9 | 71.6 | 65.0 | 64.0 | 66.8 | 71.4 | 60.2 | 178.58s | 29132003 |
input:28539688
output:592315
|
$94.5038 |
$3 / $15
|
v3.2.6 | 2026-02-18 | 2026-04-06 |
| 9 |
|
30.7%
|
60.3 | 65.5 | 66.8 | 87.4 | 68.0 | 67.4 | 61.5 | 62.6 | 71.4 | 62.7 | 160.53s | 72630744 |
input:71928542
output:702202
|
$18.9683 |
$0.26 / $0.38
|
v3.2.6 | 2025-12-01 | 2026-04-06 |
| 10 |
|
33.9%
|
59.0 | 65.4 | 65.9 | 93.8 | 68.3 | 64.8 | 66.9 | 61.2 | 72.1 | 57.4 | 88.73s | 41679573 |
input:3470897
output:369572
cacheread:37839104
|
$9.2045 |
$0.3827 / $1.72
|
v3.2.6 | 2026-01-27 | 2026-04-02 |
| 11 |
|
39.0%
|
55.9 | 65.3 | 66.0 | 91.2 | 71.2 | 63.3 | 64.1 | 66.7 | 68.4 | 56.2 | 120.83s | 62909389 |
input:10724289
output:627594
cacheread:51557506
|
$46.7179 |
$1.16 / $6.97
¥8 / ¥48
|
v3.2.6 | 2026-04-02 | 2026-04-04 |
| 12 |
|
37.6%
|
53.2 | 64.7 | 65.1 | 95.9 | 70.3 | 64.1 | 60.1 | 66.2 | 66.5 | 60.3 | 107.16s | 46485163 |
input:12390088
output:665699
cacheread:33429376
|
$31.1019 |
$1 / $3
|
v3.2.6 | 2026-03-19 | 2026-04-01 |
| 13 |
|
34.8%
|
54.6 | 64.5 | 65.0 | 94.9 | 69.4 | 63.0 | 60.2 | 65.4 | 70.3 | 58.3 | 205.9s | 40976389 |
input:40405158
output:571231
|
$0 |
$0 / $0
¥0 / ¥0
|
v3.2.6 | 2026-01-16 | 2026-04-04 |
| 14 |
|
28.5%
|
61.8 | 64.0 | 64.5 | 95.9 | 69.7 | 65.5 | 62.5 | 59.3 | 71.9 | 53.6 | 122.49s | 34977005 |
input:3229307
output:564274
cacheread:31183424
|
$14.849 |
$0.72 / $2.3
|
v3.2.6 | 2026-02-11 | 2026-03-31 |
| 15 |
|
29.5%
|
57.0 | 63.6 | 64.3 | 93.4 | 64.8 | 65.2 | 64.8 | 59.6 | 69.7 | 57.0 | 104.98s | 42991923 |
input:2767151
output:872645
cacheread:39352127
|
$10.4109 |
$0.39 / $1.9
|
v3.2.6 | 2025-09-30 | 2026-04-05 |
| 16 |
|
35.2%
|
56.6 | 63.5 | 64.0 | 93.1 | 65.4 | 65.6 | 63.5 | 59.8 | 68.9 | 57.2 | 74.53s | 40532601 |
input:12022794
output:801205
cacheread:27708602
|
$11.9532 |
$0.4 / $2
|
v3.2.6 | 2026-03-19 | 2026-04-01 |
| 17 |
|
32.8%
|
55.1 | 63.4 | 64.9 | 84.6 | 66.6 | 63.2 | 64.3 | 59.1 | 66.3 | 59.8 | 125.1s | 48417591 |
input:36403992
output:507295
cacheread:11506304
|
$113.0023 |
$2.5 / $15
|
v3.2.6 | 2026-03-05 | 2026-04-02 |
| 18 |
|
26.4%
|
54.8 | 62.7 | 63.3 | 94.3 | 65.1 | 63.9 | 60.5 | 63.8 | 69.2 | 53.6 | 124.51s | 40323216 |
input:3276264
output:829547
cacheread:36217405
|
$9.7918 |
$0.39 / $1.75
|
v3.2.6 | 2025-12-23 | 2026-04-01 |
| 19 |
|
31.7%
|
52.3 | 60.3 | 61.3 | 88.7 | 63.5 | 63.0 | 62.2 | 60.7 | 63.3 | 47.7 | 107.08s | 98806785 |
input:16915312
output:544641
cacheread:81346832
|
$17.9302 |
$0.3 / $1.2
|
v3.2.6 | 2026-03-27 | 2026-04-01 |
| 20 |
|
28.3%
|
47.2 | 60.1 | 60.9 | 91.8 | 57.8 | 65.4 | 57.7 | 59.8 | 69.5 | 52.1 | 162.28s | 54383209 |
input:53285801
output:1097408
|
$44.4101 |
$0.77 / $3.08
¥5.3 / ¥21.2
|
v3.2.6 | 2025-12-05 | 2026-04-04 |
| 21 |
|
36.6%
|
57.8 | 59.8 | 60.2 | 95.5 | 72.8 | 64.1 | 62.9 | 57.6 | 58.7 | 37.2 | 220.61s | 67262104 |
input:11956034
output:284269
cacheread:55021801
|
$41.2943 |
$1.017 / $4.068
¥7 / ¥28
|
v3.2.6 | 2026-01-23 | 2026-04-07 |
| 22 |
|
23.3%
|
52.7 | 58.8 | 59.4 | 92.6 | 57.6 | 63.1 | 61.1 | 60.5 | 60.8 | 49.0 | 96.4s | 54177427 |
input:8780828
output:669815
cacheread:44726784
|
$4.3381 |
$0.118 / $0.99
|
v3.2.6 | 2026-02-12 | 2026-04-02 |
| 23 |
|
30.1%
|
54.6 | 58.1 | 58.6 | 95.1 | 57.4 | 61.0 | 60.0 | 54.2 | 66.2 | 49.4 | 107.66s | 49678586 |
input:14860816
output:717196
cacheread:34100574
|
$72.4286 |
$2 / $12
|
v3.2.6 | 2026-02-20 | 2026-04-02 |
| 24 |
|
15.9%
|
45.1 | 57.1 | 58.1 | 90.8 | 57.8 | 62.1 | 56.3 | 55.6 | 61.9 | 48.9 | 122.05s | 63879796 |
input:63382367
output:497429
|
$41.9994 |
$0.65 / $1.61
¥4.5 / ¥11.1
|
v3.2.6 | 2025-12-05 | 2026-04-05 |
| 25 |
|
20.2%
|
50.0 | 56.8 | 57.3 | 94.8 | 59.0 | 60.0 | 52.3 | 55.4 | 63.6 | 51.5 | 134.51s | 70440558 |
input:8033465
output:678335
cacheread:59133252
cachewrite:2595506
|
$12.4834 |
$0.3 / $1.2
|
v3.2.6 | 2026-03-18 | 2026-04-02 |
| 26 |
|
16.4%
|
50.5 | 56.8 | 57.3 | 93.1 | 58.8 | 59.0 | 56.0 | 56.4 | 60.6 | 49.3 | 92.82s | 49828422 |
input:9501688
output:677502
cacheread:39649232
|
$8.5617 |
$0.27 / $0.95
|
v3.2.6 | 2025-12-23 | 2026-04-01 |
| 27 |
|
27.8%
|
50.2 | 54.6 | 55.5 | 89.3 | 56.0 | 55.6 | 60.5 | 57.8 | 52.5 | 42.9 | 508.52s | 63436733 |
input:8824540
output:336271
cacheread:54275922
|
$21.676 |
$0.581 / $2.325
¥4 / ¥16
|
v3.2.6 | 2026-02-16 | 2026-04-07 |
| 28 |
|
19.4%
|
41.8 | 54.6 | 55.7 | 87.7 | 56.4 | 59.3 | 56.5 | 52.3 | 57.7 | 43.9 | 71.83s | 104722641 |
input:12293170
output:381362
cacheread:92048109
|
$21.7349 |
$0.3632 / $1.4529
¥2.5 / ¥10
|
v3.2.6 | 2026-02-20 | 2026-04-08 |
| 29 |
|
22.4%
|
48.9 | 53.8 | 54.3 | 93.0 | 54.2 | 63.0 | 37.8 | 57.1 | 61.9 | 53.7 | 110.14s | 109050773 |
input:26076719
output:244800
cacheread:82729254
|
$34.4551 |
$0.5 / $3
|
v3.2.6 | 2025-12-17 | 2026-04-05 |
| 30 |
|
14.2%
|
48.9 | 52.4 | 53.0 | 90.7 | 50.2 | 57.0 | 48.0 | 51.7 | 58.6 | 50.8 | 109.15s | 43699451 |
input:18648766
output:1078701
cacheread:23971984
|
$5.2424 |
$0.15 / $0.6
|
v3.2.6 | 2026-03-16 | 2026-04-06 |
| 31 |
|
15.3%
|
32.9 | 52.3 | 53.6 | 86.9 | 50.1 | 56.9 | 55.4 | 50.9 | 58.4 | 41.9 | 107.1s | 86247300 |
input:85197273
output:1050027
|
$8.8452 |
$0.1 / $0.31
¥0.7 / ¥2.1
|
v3.2.6 | 2026-04-02 | 2026-04-02 |
| 32 |
|
14.7%
|
31.1 | 51.8 | 53.0 | 86.1 | 50.1 | 58.8 | 54.3 | 49.8 | 57.3 | 40.0 | 103.62s | 87855843 |
input:86724853
output:1130990
|
$9.0118 |
$0.1 / $0.3
|
v3.2.6 | 2026-02-02 | 2026-04-04 |
| 33 |
|
14.7%
|
42.1 | 48.8 | 49.6 | 92.3 | 60.2 | 44.7 | 43.4 | 48.7 | 50.4 | 43.5 | 80.19s | 51133010 |
input:7668306
output:197632
cacheread:43267072
|
$59.7895 |
$2 / $6
|
v3.2.6 | 2026-03-12 | 2026-04-04 |
| 34 |
|
14.7%
|
41.2 | 48.4 | 49.0 | 89.9 | 46.6 | 55.6 | 50.1 | 45.3 | 53.6 | 39.4 | 242.33s | 32609723 |
input:31554432
output:1055291
|
$14.3483 |
$0.44 / $0.44
¥3 / ¥3
|
v3.2.6 | 2026-02-11 | 2026-04-07 |
| 35 |
|
9.5%
|
35.3 | 47.9 | 48.7 | 91.2 | 42.9 | 53.7 | 54.7 | 39.4 | 55.6 | 40.2 | 82.69s | 63192773 |
input:13974967
output:1641102
cacheread:47576704
|
$4.2687 |
$0.1 / $0.3
|
v3.2.6 | 2026-02-02 | 2026-04-01 |
| 36 |
|
7.1%
|
26.0 | 42.9 | 43.0 | 96.6 | 29.1 | 52.1 | 44.0 | 40.8 | 54.6 | 41.0 | 115.4s | 32527395 |
input:9183126
output:467510
cacheread:22876759
|
$2.6896 |
$0.12 / $0.46
¥0.8 / ¥3.2
|
v3.2.6 | 2026-04-02 | 2026-04-05 |
| 11 |
|
29.4%
|
50.3 | 61.7 | 62.4 | 92.0 | 64.4 | 63.8 | 60.2 | 60.7 | 68.9 | 51.9 | 104.92s | 66378687 |
input:11027658
output:621286
cacheread:54729743
|
$48.9604 |
$1.1624 / $6.9741
¥8 / ¥48
|
v3.2.6 | 2026-04-02 | 2026-04-09 |
| 6 |
|
40.4%
|
62.9 | 65.0 | 65.8 | 93.4 | 63.6 | 67.2 | 68.9 | 59.8 | 72.7 | 56.7 | 105.51s | 79743715 |
input:10387510
output:569004
cacheread:68787201
|
$19.5397 |
$0.4068 / $2.3247
¥2.8 / ¥16.0
|
v3.2.6 | 2026-02-14 | 2026-04-09 |
1($)≈ 6.8826(¥).
cache_read_tokens and cache_write_tokens are billed at 0.5x the input-token price.
cost_usd = (input_tokens + 0.5 * (cache_read_tokens + cache_write_tokens)) * price_usd_input / 1e6 + output_tokens * price_usd_output / 1e6
Visual Leaderboard
View the full leaderboard in chart form. When you switch metrics, the bar ranking below re-sorts by the selected field.
qwen3.5-plus
qwen · bailiancodingplan
GLM-5.1
glm · glm
doubao-seed-2.0-pro
seed · volcengine-plan
doubao-seed-2.0-code
seed · volcengine-plan
DeepSeek-V3.2
deepseek-ai · siliconflow
GLM-5-Turbo
glm · glm
DeepSeek-V3.2
deepseek-ai · volcengine-plan
Claude Sonnet 4.6
anthropic · openrouter
DeepSeekV3.2
deepseek-ai · baiduqianfan
kimi-k2.5
moonshot · moonshot
qwen3.6-plus
qwen · bailianapi
mimo-v2-pro
xiaomi · openrouter
LongCat-Flash-Thinking-2601
meituan · longcat
GLM-5
glm · glm
GLM-4.6
glm · glm
mimo-v2-omni
xiaomi · openrouter
gpt-5.4
openai · openrouter
GLM-4.7
glm · glm
kat-coder-pro-v2
kwaipilot · openrouter
hunyuan-2.0-thinking
tencent · tencent-token-plan
qwen3-max-2026-01-23
qwen · bailiancodingplan
MiniMax-M2.5
minimax · minimax
gemini-3.1-pro-preview
google · openrouter
hunyuan-2.0-instruct
tencent · tencent-token-plan
MiniMax-M2.7
minimax · minimax
MiniMax-M2.1
minimax · minimax
Ling-2.5-1T
tbox · antling
qwen3-coder-next
qwen · bailiancodingplan
gemini-3-flash-preview
google · openrouter
mistral-small-2603
mistralai · openrouter
step-3.5-flash-2603
stepfun · stepfun
step-3.5-flash
stepfun · stepfun
grok-4.20
x-ai · openrouter
Spark X2
xunfei · astroncodingplan
step-3.5-flash
stepfun · openrouter
ERNIE-4.5-Turbo
baidu · baiduqianfan
qwen3.6-plus
qwen · bailiancodingplan
doubao-seed-code
seed · volcengine-plan
Score vs Runtime
Compare model scores against average runtime. The x-axis is Avg Runtime, and the y-axis can be switched across score fields.
Score vs Cost
Compare model scores against benchmark cost. The x-axis is Cost, and the y-axis can be switched across score fields.