A benchmark for OpenClaw agents
ClawProBench
A benchmark designed to measure how different models perform under OpenClaw when they need to reason, plan, use tools, and stay reliable across repeated runs.
Benchmark Leaderboard
Final Score is the default rank signal. The table below shows the open dataset by default and adds closed dataset rows where a model has completed the 68-task closed run.
Choose Columns
Current sorting will be preserved.
Dataset
| # | Model | Final Score | Pass^3 | Pass@3 | Avg Score | Capability | Efficiency | Planning | Safety | Tool Use | Constraints | Error Recovery | Synthesis | Avg Runtime | Total Token | Token | Cost | Price | OpenClaw | Released | Updated |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 |
|
73.71 |
61.9%
|
81.4 | 76.7 | 78.0 | 89.8 | 79.2 | 70.5 | 82.2 | 76.9 | 73.7 | 75.2 | 95.37s | 70824855 |
input:11105542
output:886191
cacheread:58833122
|
$0 |
$0 / $0
¥0 / ¥0
|
v3.2.6 | 2026-05-15 | 2026-05-16 |
| #29 |
|
47.01 |
25.1%
|
62.5 | 61.8 | 61.9 | 99.6 | 64.6 | 49.9 | 62.2 | 65.1 | 62.7 | 65.4 | 70.48s | 25516250 |
input:6473542
output:222520
cacheread:18820188
|
$0 |
$0 / $0
¥0 / ¥0
|
v3.2.6 | 2026-05-15 | 2026-05-15 |
| 2 |
|
71.59 |
63.4%
|
76.5 | 73.7 | 74.3 | 93.8 | 69.7 | 70.5 | 78.8 | 70.1 | 78.5 | 74.3 | 124.84s | 41797781 |
input:30999558
output:763023
cacheread:10035200
|
$0 |
$0 / $0
¥0 / ¥0
|
v3.2.6 | 2026-05-08 | 2026-05-07 |
| #33 |
|
45.11 |
22.2%
|
58.5 | 62.1 | 62.1 | 100 | 62.5 | 53.8 | 62.5 | 67.9 | 62.5 | 62.9 | 121.3s | 16527596 |
input:14660476
output:220528
cacheread:1646592
|
$0 |
$0 / $0
¥0 / ¥0
|
v3.2.6 | 2026-05-08 | 2026-05-13 |
| 3 |
|
68.58 |
57.7%
|
72.9 | 70.7 | 71.6 | 92.0 | 57.5 | 67.6 | 79.4 | 77.9 | 73.3 | 70.3 | 125.3s | 26530872 |
input:6925181
output:1622971
cacheread:17982720
|
$19.0706 |
$0.8721 / $3.1977
¥6 / ¥22
|
v3.2.6 | 2026-05-09 | 2026-05-10 |
| #32 |
|
46.24 |
25.3%
|
60.4 | 61.7 | 61.7 | 100 | 64.1 | 49.8 | 65.8 | 61.0 | 62.5 | 65.1 | 142.79s | 7292445 |
input:4000397
output:376817
cacheread:2915231
|
$5.9649 |
$0.8721 / $3.1977
¥6 / ¥22
|
v3.2.6 | 2026-05-09 | 2026-05-14 |
| 4 |
|
67.9 |
62.7%
|
68.7 | 69.3 | 69.3 | 100 | 66.9 | 66.8 | 78.0 | 70.0 | 69.4 | 62.7 | 143.76s | 139773819 |
input:72643986
output:1352681
cacheread:65777152
|
$568.2432 |
$5 / $30
¥34.413 / ¥206.478
|
v3.2.6 | 2026-04-24 | 2026-05-04 |
| #7 |
|
55.77 |
50.2%
|
71.8 | 68.2 | 68.2 | 100 | 73.6 | 50.2 | 77.7 | 53.2 | 75.1 | 74.1 | 170.36s | 58646317 |
input:30958715
output:597426
cacheread:27090176
|
$240.4418 |
$5 / $30
¥34.413 / ¥206.478
|
v3.2.6 | 2026-04-24 | 2026-05-13 |
| 5 |
|
67.25 |
60.0%
|
70.0 | 68.0 | 68.0 | 100 | 66.8 | 67.1 | 75.6 | 66.8 | 69.2 | 60.7 | 165.36s | 137256515 |
input:74566518
output:1723981
cacheread:60966016
|
$288.4835 |
$2.5 / $15
¥17.2065 / ¥103.239
|
v3.2.6 | 2026-03-05 | 2026-05-05 |
| #10 |
|
53.04 |
37.3%
|
69.5 | 67.0 | 67.0 | 100 | 66.9 | 58.8 | 74.3 | 62.5 | 72.1 | 64.9 | 165.7s | 51143959 |
input:27146296
output:598495
cacheread:23399168
|
$106.0921 |
$2.5 / $15
¥17.2065 / ¥103.239
|
v3.2.6 | 2026-03-05 | 2026-05-13 |
| 6 |
|
65.45 |
55.9%
|
67.3 | 66.8 | 66.8 | 100 | 65.5 | 66.8 | 72.9 | 65.8 | 68.7 | 59.5 | 168.4s | 129894237 |
input:67094837
output:1348392
cacheread:61451008
|
$190.0631 |
$1.75 / $14
¥12.0446 / ¥96.3564
|
v3.2.6 | 2026-02-24 | 2026-05-06 |
| #4 |
|
58.45 |
41.9%
|
77.2 | 71.6 | 71.6 | 100 | 70.9 | 68.1 | 75.0 | 68.8 | 75.1 | 70.5 | 176.29s | 59966437 |
input:30926521
output:575404
cacheread:28464512
|
$87.0835 |
$1.75 / $14
¥12.0446 / ¥96.3564
|
v3.2.6 | 2026-02-24 | 2026-05-13 |
| 7 |
|
64.38 |
47.2%
|
65.1 | 69.6 | 69.9 | 96.6 | 74.9 | 68.2 | 70.1 | 66.6 | 72.4 | 63.5 | 115.94s | 34630713 |
input:8007935
output:1429306
cacheread:25193472
|
$50.634 |
$2.0341 / $6.1023
¥14 / ¥42
|
v3.2.6 | 2026-04-24 | 2026-04-25 |
| #6 |
|
56.35 |
34.1%
|
79.0 | 65.7 | 65.7 | 100 | 74.1 | 48.6 | 71.6 | 55.2 | 72.4 | 67.2 | 153.64s | 13073754 |
input:2428295
output:395219
cacheread:10250240
|
$17.7761 |
$2.0341 / $6.1023
¥14 / ¥42
|
v3.2.6 | 2026-04-24 | 2026-05-14 |
| 8 |
|
64.19 |
49.6%
|
61.2 | 70.1 | 70.9 | 91.1 | 76.5 | 67.2 | 70.7 | 69.6 | 75.0 | 59.4 | 123.72s | 74256276 |
input:10347477
output:534393
cacheread:63374406
|
$11.7627 |
$0.26 / $1.56
|
v3.2.6 | 2026-02-15 | 2026-04-06 |
| #13 |
|
52.39 |
31.2%
|
71.7 | 64.6 | 64.6 | 100 | 69.8 | 48.7 | 68.0 | 60.7 | 71.1 | 66.3 | 139.34s | 20529559 |
input:6489516
output:216852
cacheread:13823191
|
$3.8226 |
$0.26 / $1.56
¥0 / ¥0
|
v3.2.6 | 2026-02-15 | 2026-05-16 |
| 9 |
|
64.18 |
49.5%
|
60.8 | 70.4 | 71.0 | 93.9 | 77.8 | 65.5 | 71.6 | 72.4 | 74.5 | 57.7 | 98.69s | 57194123 |
input:10647098
output:535556
cacheread:46011469
|
$16.0699 |
$0.4359 / $2.6153
¥3 / ¥18
|
v3.2.6 | 2026-02-15 | 2026-04-30 |
| 10 |
|
63.3 |
46.5%
|
62.5 | 68.5 | 68.8 | 97.4 | 74.1 | 63.9 | 70.6 | 67.4 | 70.8 | 61.9 | 63.3s | 32650649 |
input:7473261
output:691052
cacheread:24486336
|
$44.3222 |
$2.0341 / $6.1023
¥14 / ¥42
|
v3.2.6 | 2026-04-22 | 2026-04-24 |
| #16 |
|
51.6 |
35.6%
|
68.1 | 64.9 | 65.0 | 100.0 | 73.9 | 46.1 | 72.0 | 60.3 | 71.4 | 60.5 | 89s | 15403931 |
input:3357860
output:231095
cacheread:11814976
|
$20.2569 |
$2.0341 / $6.1023
¥14 / ¥42
|
v3.2.6 | 2026-04-22 | 2026-05-12 |
| 11 |
|
62.93 |
44.9%
|
61.6 | 69.0 | 69.1 | 98.8 | 75.2 | 68.4 | 71.0 | 65.8 | 74.1 | 56.7 | 89.33s | 47621722 |
input:2299534
output:356684
cacheread:44965504
|
$25.9237 |
$1 / $3.2
¥6.88 / ¥22.02
|
v3.2.6 | 2026-03-27 | 2026-03-30 |
| #11 |
|
52.89 |
39.0%
|
69.3 | 66.1 | 66.1 | 100 | 71.7 | 48.9 | 70.9 | 63.2 | 70.2 | 67.9 | 138.64s | 11989353 |
input:424654
output:99483
cacheread:11465216
|
$6.4756 |
$1 / $3.2
¥6.88 / ¥22.02
|
v3.2.6 | 2026-03-27 | 2026-05-16 |
| 12 |
|
62.36 |
43.2%
|
62.9 | 67.7 | 68.6 | 90.3 | 70.3 | 67.5 | 69.1 | 62.9 | 74.6 | 60.7 | 149.96s | 33552869 |
input:3918340
output:191987
cacheread:29442542
|
$27.3375 |
$1.3948 / $6.9741
¥9.6 / ¥48
|
v3.2.6 | 2026-02-14 | 2026-04-04 |
| #2 |
|
62.44 |
52.8%
|
81.8 | 73.2 | 73.2 | 100 | 73.7 | 79.3 | 69.4 | 67.9 | 75.0 | 74.9 | 158.18s | 36067246 |
input:6524976
output:206269
cacheread:29336001
|
$30.9985 |
$1.3948 / $6.9741
¥9.6 / ¥48
|
v3.2.6 | 2026-02-14 | 2026-05-11 |
| 13 |
|
62.05 |
35.9%
|
74.1 | 65.2 | 65.9 | 92.1 | 59.1 | 67.9 | 77.2 | 62.0 | 65.3 | 57.6 | 127.94s | 38611988 |
input:17862963
output:1506192
cacheread:19242833
|
$0 |
$0 / $0
¥0 / ¥0
|
v3.2.6 | 2026-05-12 | |
| #20 |
|
49.34 |
26.5%
|
67.2 | 62.8 | 62.8 | 100.0 | 61.0 | 54.5 | 72.8 | 54.1 | 64.6 | 67.1 | 90.13s | 14296281 |
input:8071709
output:389732
cacheread:5834840
|
$0 |
$0 / $0
¥0 / ¥0
|
v3.2.6 | 2026-05-14 | |
| 14 |
|
61.92 |
42.9%
|
61.6 | 67.4 | 67.6 | 98.1 | 72.3 | 67.6 | 69.9 | 67.2 | 66.6 | 58.4 | 51.87s | 23122455 |
input:2087604
output:303610
cacheread:20731241
|
$16.1583 |
$1.2 / $4
|
v3.2.6 | 2026-03-16 | 2026-04-28 |
| #14 |
|
51.88 |
39.1%
|
66.2 | 66.9 | 66.9 | 100 | 72.2 | 56.6 | 67.2 | 64.1 | 68.4 | 71.3 | 140.06s | 12055326 |
input:420194
output:103036
cacheread:11532096
|
$7.8356 |
$1.2 / $4
¥0 / ¥0
|
v3.2.6 | 2026-03-16 | 2026-05-16 |
| 15 |
|
61.47 |
40.9%
|
61.4 | 67.6 | 67.9 | 95.4 | 72.7 | 65.5 | 68.2 | 66.3 | 69.8 | 61.0 | 76.39s | 35990776 |
input:8281350
output:1484274
cacheread:26225152
|
$3.5399 |
$0.1453 / $0.2906
¥1 / ¥2
|
v3.2.6 | 2026-04-24 | 2026-04-25 |
| #31 |
|
46.68 |
16.2%
|
67.8 | 59.9 | 59.9 | 100 | 57.7 | 47.4 | 72.1 | 51.9 | 66.5 | 60.1 | 198.43s | 13256873 |
input:2429587
output:416918
cacheread:10410368
|
$1.2305 |
$0.1453 / $0.2906
¥1 / ¥2
|
v3.2.6 | 2026-04-24 | 2026-05-16 |
| 16 |
|
61.07 |
41.6%
|
57.5 | 68.3 | 68.6 | 97.4 | 70.1 | 68.3 | 73.1 | 65.1 | 73.8 | 57.4 | 90.89s | 20474370 |
input:3405226
output:188658
cacheread:16880486
|
$17.8378 |
$1.3948 / $6.9741
¥9.6 / ¥48
|
v3.2.6 | 2026-02-14 | 2026-04-04 |
| #18 |
|
51.3 |
39.8%
|
63.5 | 68.5 | 68.5 | 100 | 74.6 | 67.5 | 61.7 | 65.0 | 75.2 | 67.1 | 159.7s | 31157190 |
input:6557722
output:273428
cacheread:24326040
|
$28.0186 |
$1.3948 / $6.9741
¥9.6 / ¥48
|
v3.2.6 | 2026-02-14 | 2026-05-14 |
| 17 |
|
60.5 |
45.5%
|
53.9 | 66.6 | 67.5 | 90.9 | 71.6 | 65.0 | 64.0 | 66.8 | 71.4 | 60.2 | 178.58s | 29132003 |
input:28539688
output:592315
|
$94.5038 |
$3 / $15
|
v3.2.6 | 2026-02-18 | 2026-04-06 |
| 18 |
|
60.4 |
44.7%
|
55.1 | 66.1 | 66.6 | 93.0 | 66.7 | 65.1 | 70.3 | 64.8 | 73.1 | 55.1 | 275.5s | 90593669 |
input:11840992
output:583364
cacheread:78169313
|
$14.2325 |
$0.2615 / $1.5692
¥1.8 / ¥10.8
|
v3.2.6 | 2026-02-14 | 2026-04-12 |
| 19 |
|
60.39 |
43.1%
|
55.8 | 66.6 | 66.8 | 96.6 | 70.9 | 65.9 | 64.8 | 64.8 | 72.3 | 59.7 | 46.68s | 35597981 |
input:7607726
output:655087
cacheread:27335168
|
$19.9746 |
$0.8136 / $4.0682
¥5.6 / ¥28
|
v3.2.6 | 2026-04-22 | 2026-04-23 |
| #24 |
|
48.02 |
25.0%
|
64.9 | 62.2 | 62.2 | 100 | 69.6 | 54.4 | 68.9 | 55.7 | 60.1 | 59.5 | 194.11s | 15107697 |
input:3328716
output:195045
cacheread:11583936
|
$8.2141 |
$0.8136 / $4.0682
¥5.6 / ¥28
|
v3.2.6 | 2026-04-22 | 2026-05-16 |
| 20 |
|
60.2 |
43.0%
|
54.5 | 66.9 | 67.3 | 95.6 | 70.9 | 64.0 | 67.2 | 67.8 | 71.5 | 58.4 | 90.34s | 48872189 |
input:10301722
output:614619
cacheread:37955848
|
$38.2483 |
$1.16 / $6.97
¥8 / ¥48
|
v3.2.6 | 2026-04-02 | 2026-04-22 |
| #5 |
|
57.08 |
39.7%
|
77.1 | 68.3 | 68.3 | 100 | 74.1 | 60.5 | 72.6 | 62.0 | 63.9 | 73.4 | 131.84s | 13239510 |
input:12982159
output:257351
|
$16.853 |
$1.16 / $6.97
¥8 / ¥48
|
v3.2.6 | 2026-04-02 | 2026-05-13 |
| 21 |
|
60.13 |
39.0%
|
57.0 | 67.6 | 68.4 | 90.6 | 68.7 | 65.8 | 68.9 | 65.3 | 73.3 | 62.8 | 160.91s | 65468885 |
input:10004682
output:637259
cacheread:54826944
|
$9.9709 |
$0.26 / $0.38
|
v3.2.6 | 2025-12-01 | 2026-04-03 |
| 22 |
|
60.12 |
36.9%
|
60.8 | 66.7 | 68.0 | 86.9 | 68.8 | 67.1 | 64.2 | 65.4 | 72.5 | 62.5 | 201.7s | 97734852 |
input:10027746
output:545655
cacheread:87161451
|
$14.1456 |
$0.26 / $0.38
|
v3.2.6 | 2025-12-01 | 2026-04-06 |
| #9 |
|
54.0 |
33.3%
|
73.5 | 66.3 | 66.3 | 100 | 71.3 | 60.2 | 65.4 | 71.0 | 62.6 | 66.3 | 173.07s | 33217073 |
input:6440532
output:146561
cacheread:26629980
|
$5.1921 |
$0.26 / $0.38
¥0 / ¥0
|
v3.2.6 | 2025-12-01 | 2026-05-16 |
| 23 |
|
59.39 |
41.6%
|
56.6 | 64.2 | 64.6 | 96.0 | 73.3 | 61.1 | 60.1 | 63.3 | 72.6 | 53.3 | 56.08s | 32183778 |
input:31764878
output:418900
|
$9.7178 |
$0.2906 / $1.1624
¥2 / ¥8
|
v3.2.6 | 2026-04-23 | 2026-04-23 |
| #21 |
|
49.2 |
35.2%
|
62.7 | 64.2 | 64.2 | 100 | 70.6 | 49.8 | 68.2 | 57.3 | 61.7 | 74.0 | 189.79s | 12450815 |
input:12325159
output:125656
|
$3.7278 |
$0.2906 / $1.1624
¥2 / ¥8
|
v3.2.6 | 2026-04-23 | 2026-05-16 |
| 24 |
|
59.31 |
39.1%
|
54.2 | 67.0 | 67.2 | 96.7 | 70.4 | 65.2 | 70.6 | 66.1 | 68.6 | 58.7 | 102.04s | 56361641 |
input:11160022
output:935949
cacheread:44265670
|
$35.1856 |
$0.9444 / $4
¥6.5 / ¥27
|
v3.2.6 | 2026-04-21 | 2026-04-28 |
| #8 |
|
54.14 |
38.2%
|
71.7 | 67.2 | 67.2 | 100 | 76.2 | 48.9 | 73.2 | 55.3 | 76.0 | 68.6 | 139.78s | 19787995 |
input:6555533
output:286299
cacheread:12946163
|
$13.4494 |
$0.9444 / $4
¥6.5 / ¥27
|
v3.2.6 | 2026-04-21 | 2026-05-11 |
| 25 |
|
59.22 |
40.4%
|
55.7 | 65.0 | 65.8 | 93.4 | 63.6 | 67.2 | 68.9 | 59.8 | 72.7 | 56.7 | 105.51s | 79743715 |
input:10387510
output:569004
cacheread:68787201
|
$19.5397 |
$0.4068 / $2.3247
¥2.8 / ¥16
|
v3.2.6 | 2026-02-14 | 2026-04-09 |
| 26 |
|
59.05 |
39.0%
|
55.9 | 65.3 | 66.0 | 91.2 | 71.2 | 63.3 | 64.1 | 66.7 | 68.4 | 56.2 | 120.83s | 62909389 |
input:10724289
output:627594
cacheread:51557506
|
$46.7179 |
$1.16 / $6.97
¥8 / ¥48
|
v3.2.6 | 2026-04-02 | 2026-04-04 |
| 27 |
|
58.8 |
39.0%
|
53.0 | 66.3 | 66.6 | 97.1 | 76.4 | 66.6 | 66.1 | 54.9 | 72.9 | 57.5 | 94.1s | 30287164 |
input:10892258
output:471130
cacheread:18923776
|
$0 |
$0 / $0
¥0 / ¥0
|
v3.2.6 | 2026-04-24 | 2026-04-25 |
| #19 |
|
49.93 |
30.2%
|
65.7 | 64.8 | 64.8 | 100 | 73.8 | 50.9 | 73.2 | 62.4 | 53.1 | 69.6 | 141.44s | 12555390 |
input:307236
output:143706
cacheread:12104448
|
$0 |
$0 / $0
¥0 / ¥0
|
v3.2.6 | 2026-04-24 | 2026-05-14 |
| 28 |
|
58.74 |
37.9%
|
55.2 | 65.5 | 65.9 | 96.7 | 68.9 | 67.1 | 61.5 | 67.3 | 71.4 | 57.3 | 89.85s | 50185369 |
input:10744142
output:691108
cacheread:38750119
|
$14.9364 |
$0.4359 / $2.6153
¥3 / ¥18
|
v3.2.6 | 2026-04-22 | 2026-04-26 |
| 29 |
|
58.49 |
33.9%
|
59.0 | 65.4 | 65.9 | 93.8 | 68.3 | 64.8 | 66.9 | 61.2 | 72.1 | 57.4 | 88.73s | 41679573 |
input:3470897
output:369572
cacheread:37839104
|
$9.2045 |
$0.3827 / $1.72
|
v3.2.6 | 2026-01-27 | 2026-04-02 |
| 30 |
|
57.94 |
30.7%
|
60.3 | 65.5 | 66.8 | 87.4 | 68.0 | 67.4 | 61.5 | 62.6 | 71.4 | 62.7 | 160.53s | 72630744 |
input:71928542
output:702202
|
$18.9683 |
$0.26 / $0.38
|
v3.2.6 | 2025-12-01 | 2026-04-06 |
| 31 |
|
57.92 |
37.6%
|
53.2 | 64.7 | 65.1 | 95.9 | 70.3 | 64.1 | 60.1 | 66.2 | 66.5 | 60.3 | 107.16s | 46485163 |
input:12390088
output:665699
cacheread:33429376
|
$31.1019 |
$1 / $3
|
v3.2.6 | 2026-03-19 | 2026-04-01 |
| #23 |
|
48.18 |
26.6%
|
64.4 | 62.5 | 62.5 | 100 | 66.1 | 53.8 | 71.4 | 55.5 | 70.5 | 53.5 | 83.57s | 15012098 |
input:3388947
output:201711
cacheread:11421440
|
$9.7048 |
$1 / $3
¥0 / ¥0
|
v3.2.6 | 2026-03-19 | 2026-05-17 |
| 32 |
|
57.65 |
35.2%
|
56.6 | 63.5 | 64.0 | 93.1 | 65.4 | 65.6 | 63.5 | 59.8 | 68.9 | 57.2 | 74.53s | 40532601 |
input:12022794
output:801205
cacheread:27708602
|
$11.9532 |
$0.4 / $2
|
v3.2.6 | 2026-03-19 | 2026-04-01 |
| #30 |
|
46.98 |
24.5%
|
62.0 | 62.6 | 62.6 | 100 | 65.1 | 54.9 | 71.1 | 55.9 | 65.9 | 59.3 | 81.93s | 15024652 |
input:5571769
output:243027
cacheread:9209856
|
$4.5567 |
$0.4 / $2
¥0 / ¥0
|
v3.2.6 | 2026-03-19 | 2026-05-17 |
| 33 |
|
57.48 |
34.8%
|
54.6 | 64.5 | 65.0 | 94.9 | 69.4 | 63.0 | 60.2 | 65.4 | 70.3 | 58.3 | 205.9s | 40976389 |
input:40405158
output:571231
|
$0 |
$0 / $0
¥0 / ¥0
|
v3.2.6 | 2026-01-16 | 2026-04-04 |
| 34 |
|
57.4 |
36.6%
|
58.8 | 60.7 | 61.7 | 91.4 | 70.6 | 57.1 | 62.4 | 57.0 | 65.7 | 47.5 | 69.93s | 59026717 |
input:20384528
output:494895
cacheread:38147294
|
$13.0747 |
$0.3 / $2.5
¥2.0648 / ¥17.2065
|
v3.2.6 | 2026-04-24 | 2026-04-25 |
| 35 |
|
57.4 |
36.0%
|
51.5 | 65.2 | 65.8 | 94.9 | 75.2 | 64.5 | 57.7 | 61.8 | 71.7 | 59.6 | 132.61s | 60826974 |
input:11239120
output:629190
cacheread:48958664
|
$86.0724 |
$2.1794 / $13.0765
¥15 / ¥90
|
v3.2.6 | 2026-04-20 | 2026-04-24 |
| #17 |
|
51.48 |
40.0%
|
66.4 | 65.1 | 65.1 | 100 | 76.8 | 50.8 | 72.5 | 65.8 | 55.8 | 62.8 | 151.95s | 20456157 |
input:6493368
output:202535
cacheread:13760254
|
$31.7946 |
$2.1794 / $13.0765
¥15 / ¥90
|
v3.2.6 | 2026-04-20 | 2026-05-13 |
| 36 |
|
57.14 |
32.1%
|
59.9 | 62.5 | 63.1 | 94.2 | 67.1 | 61.4 | 62.2 | 61.2 | 68.1 | 53.2 | 71.28s | 34448208 |
input:3235977
output:877255
cacheread:30334976
|
$20.9923 |
$0.95 / $4
¥6.5 / ¥27
|
v3.2.6 | 2026-04-21 | 2026-04-16 |
| 37 |
|
57.05 |
28.5%
|
61.8 | 64.0 | 64.5 | 95.9 | 69.7 | 65.5 | 62.5 | 59.3 | 71.9 | 53.6 | 122.49s | 34977005 |
input:3229307
output:564274
cacheread:31183424
|
$14.849 |
$0.72 / $2.3
|
v3.2.6 | 2026-02-11 | 2026-03-31 |
| 38 |
|
56.94 |
35.5%
|
52.2 | 63.9 | 64.3 | 96.2 | 66.4 | 64.3 | 60.1 | 62.2 | 73.9 | 57.0 | 54.28s | 53780281 |
input:11059276
output:743635
cacheread:41977370
|
$9.5475 |
$0.2615 / $1.5692
¥1.8 / ¥10.8
|
v3.2.6 | 2026-04-16 | 2026-04-26 |
| 39 |
|
56.55 |
35.7%
|
51.0 | 63.4 | 63.9 | 94.2 | 69.7 | 60.5 | 61.4 | 61.6 | 73.0 | 52.6 | 60.52s | 61455165 |
input:10973905
output:834629
cacheread:49646631
|
$28.4575 |
$0.6974 / $4.1845
¥4.8 / ¥28.8
|
v3.2.6 | 2026-04-17 | 2026-04-26 |
| 40 |
|
56.29 |
29.5%
|
57.0 | 63.6 | 64.3 | 93.4 | 64.8 | 65.2 | 64.8 | 59.6 | 69.7 | 57.0 | 104.98s | 42991923 |
input:2767151
output:872645
cacheread:39352127
|
$10.4109 |
$0.39 / $1.9
|
v3.2.6 | 2025-09-30 | 2026-04-05 |
| 41 |
|
55.76 |
36.6%
|
52.4 | 59.8 | 60.2 | 95.5 | 72.8 | 64.1 | 62.9 | 57.6 | 58.7 | 37.2 | 220.61s | 67262104 |
input:11956034
output:284269
cacheread:55021801
|
$41.2943 |
$1.017 / $4.068
¥7 / ¥28
|
v3.2.6 | 2026-01-23 | 2026-04-07 |
| 42 |
|
54.74 |
31.7%
|
52.3 | 60.3 | 61.3 | 88.7 | 63.5 | 63.0 | 62.2 | 60.7 | 63.3 | 47.7 | 107.08s | 98806785 |
input:16915312
output:544641
cacheread:81346832
|
$17.9302 |
$0.3 / $1.2
|
v3.2.6 | 2026-03-27 | 2026-04-01 |
| #3 |
|
59.68 |
50.3%
|
78.4 | 70.7 | 70.7 | 100 | 76.4 | 52.7 | 76.8 | 73.5 | 70.5 | 70.4 | 172.4s | 16430245 |
input:4379125
output:111700
cacheread:11939420
|
$3.2387 |
$0.3 / $1.2
¥0 / ¥0
|
v3.2.6 | 2026-03-27 | 2026-05-16 |
| 43 |
|
54.58 |
26.4%
|
54.8 | 62.7 | 63.3 | 94.3 | 65.1 | 63.9 | 60.5 | 63.8 | 69.2 | 53.6 | 124.51s | 40323216 |
input:3276264
output:829547
cacheread:36217405
|
$9.7918 |
$0.39 / $1.75
|
v3.2.6 | 2025-12-23 | 2026-04-01 |
| #25 |
|
47.96 |
24.1%
|
64.3 | 63.3 | 63.3 | 100 | 70.2 | 56.7 | 62.7 | 54.1 | 69.9 | 63.8 | 165.03s | 13103309 |
input:561065
output:271780
cacheread:12270464
|
$3.0872 |
$0.39 / $1.75
¥0 / ¥0
|
v3.2.6 | 2025-12-23 | 2026-05-16 |
| 44 |
|
53.95 |
30.1%
|
54.6 | 58.1 | 58.6 | 95.1 | 57.4 | 61.0 | 60.0 | 54.2 | 66.2 | 49.4 | 107.66s | 49678586 |
input:14860816
output:717196
cacheread:34100574
|
$72.4286 |
$2 / $12
|
v3.2.6 | 2026-02-20 | 2026-04-02 |
| 45 |
|
52.69 |
28.3%
|
47.2 | 60.1 | 60.9 | 91.8 | 57.8 | 65.4 | 57.7 | 59.8 | 69.5 | 52.1 | 162.28s | 54383209 |
input:53285801
output:1097408
|
$44.4101 |
$0.77 / $3.08
¥5.3 / ¥21.2
|
v3.2.6 | 2025-12-05 | 2026-04-04 |
| 46 |
|
51.79 |
23.3%
|
52.7 | 58.8 | 59.4 | 92.6 | 57.6 | 63.1 | 61.1 | 60.5 | 60.8 | 49.0 | 96.4s | 54177427 |
input:8780828
output:669815
cacheread:44726784
|
$4.3381 |
$0.118 / $0.99
|
v3.2.6 | 2026-02-12 | 2026-04-02 |
| #15 |
|
51.76 |
33.8%
|
67.6 | 66.8 | 66.8 | 100 | 67.3 | 63.6 | 68.9 | 64.3 | 67.1 | 68.7 | 111.3s | 13023376 |
input:2505661
output:167091
cacheread:10350624
|
$1.0718 |
$0.118 / $0.99
¥0 / ¥0
|
v3.2.6 | 2026-02-12 | 2026-05-14 |
| 47 |
|
51.59 |
35.0%
|
41.1 | 56.1 | 56.4 | 95.2 | 50.8 | 64.5 | 59.7 | 54.5 | 60.9 | 46.8 | 207.49s | 33753472 |
input:25401261
output:211539
cacheread:8140672
|
$4.2106 |
$0.14 / $0.4
|
v3.2.6 | 2026-04-02 | 2026-04-11 |
| 48 |
|
51.19 |
27.8%
|
50.2 | 54.6 | 55.5 | 89.3 | 56.0 | 55.6 | 60.5 | 57.8 | 52.5 | 42.9 | 508.52s | 63436733 |
input:8824540
output:336271
cacheread:54275922
|
$21.676 |
$0.581 / $2.325
¥4 / ¥16
|
v3.2.6 | 2026-02-16 | 2026-04-07 |
| 49 |
|
50.23 |
20.0%
|
52.5 | 57.8 | 57.8 | 99.6 | 52.0 | 61.9 | 55.2 | 57.3 | 67.8 | 55.2 | 702.25s | 25698873 |
input:21718776
output:3980097
|
$21.8755 |
$0.5812 / $2.3247
¥4 / ¥16
|
v3.2.6 | 2025-05-28 | 2026-05-01 |
| 50 |
|
49.53 |
20.2%
|
50.0 | 56.8 | 57.3 | 94.8 | 59.0 | 60.0 | 52.3 | 55.4 | 63.6 | 51.5 | 134.51s | 70440558 |
input:8033465
output:678335
cacheread:59133252
cachewrite:2595506
|
$12.4834 |
$0.3 / $1.2
|
v3.2.6 | 2026-03-18 | 2026-04-02 |
| #26 |
|
47.89 |
26.8%
|
64.8 | 61.0 | 61.0 | 100 | 65.6 | 48.9 | 69.7 | 48.3 | 66.8 | 62.0 | 186.75s | 12810130 |
input:382441
output:192932
cacheread:10515969
cachewrite:1718788
|
$2.1815 |
$0.3 / $1.2
¥0 / ¥0
|
v3.2.6 | 2026-03-18 | 2026-05-16 |
| 51 |
|
49.04 |
22.8%
|
44.9 | 55.7 | 64.3 | 46.5 | 58.7 | 51.4 | 58.2 | 53.0 | 62.4 | 48.9 | 268.85s | 194685491 |
input:153520279
output:465180
cacheread:40700032
|
$67.3403 |
$0.3827 / $1.72
|
v3.2.6 | 2025-10-24 | 2026-05-01 |
| 52 |
|
48.99 |
22.4%
|
48.9 | 53.8 | 54.3 | 93.0 | 54.2 | 63.0 | 37.8 | 57.1 | 61.9 | 53.7 | 110.14s | 109050773 |
input:26076719
output:244800
cacheread:82729254
|
$34.4551 |
$0.5 / $3
|
v3.2.6 | 2025-12-17 | 2026-04-05 |
| 53 |
|
48.08 |
16.4%
|
50.5 | 56.8 | 57.3 | 93.1 | 58.8 | 59.0 | 56.0 | 56.4 | 60.6 | 49.3 | 92.82s | 49828422 |
input:9501688
output:677502
cacheread:39649232
|
$8.5617 |
$0.27 / $0.95
|
v3.2.6 | 2025-12-23 | 2026-04-01 |
| #12 |
|
52.61 |
31.1%
|
70.6 | 66.7 | 66.7 | 100 | 70.7 | 61.0 | 69.8 | 63.1 | 65.1 | 68.1 | 112.2s | 12776800 |
input:2494084
output:166342
cacheread:10096756
cachewrite:19618
|
$2.1971 |
$0.27 / $0.95
¥0 / ¥0
|
v3.2.6 | 2025-12-23 | 2026-05-14 |
| 54 |
|
47.83 |
18.0%
|
48.5 | 55.2 | 56.7 | 88.2 | 55.9 | 54.4 | 53.0 | 54.2 | 62.3 | 51.6 | 417.5s | 36332107 |
input:35311598
output:1020509
|
$23.7382 |
$0.6 / $2.5
|
v3.2.6 | 2025-11-06 | 2026-04-09 |
| 55 |
|
46.94 |
15.9%
|
45.1 | 57.1 | 58.1 | 90.8 | 57.8 | 62.1 | 56.3 | 55.6 | 61.9 | 48.9 | 122.05s | 63879796 |
input:63382367
output:497429
|
$41.9994 |
$0.65 / $1.61
¥4.5 / ¥11.1
|
v3.2.6 | 2025-12-05 | 2026-04-05 |
| 56 |
|
46.84 |
19.4%
|
41.8 | 54.6 | 55.7 | 87.7 | 56.4 | 59.3 | 56.5 | 52.3 | 57.7 | 43.9 | 71.83s | 104722641 |
input:12293170
output:381362
cacheread:92048109
|
$21.7349 |
$0.3632 / $1.4529
¥2.5 / ¥10
|
v3.2.6 | 2026-02-20 | 2026-04-08 |
| 57 |
|
45.26 |
14.2%
|
48.9 | 52.4 | 53.0 | 90.7 | 50.2 | 57.0 | 48.0 | 51.7 | 58.6 | 50.8 | 109.15s | 43699451 |
input:18648766
output:1078701
cacheread:23971984
|
$5.2424 |
$0.15 / $0.6
|
v3.2.6 | 2026-03-16 | 2026-04-06 |
| 58 |
|
43.04 |
14.7%
|
42.1 | 48.8 | 49.6 | 92.3 | 60.2 | 44.7 | 43.4 | 48.7 | 50.4 | 43.5 | 80.19s | 51133010 |
input:7668306
output:197632
cacheread:43267072
|
$59.7895 |
$2 / $6
|
v3.2.6 | 2026-03-12 | 2026-04-04 |
| 59 |
|
42.72 |
12.9%
|
39.0 | 52.1 | 63.4 | 32.3 | 57.8 | 55.1 | 47.9 | 50.0 | 55.8 | 45.3 | 249.95s | 360378148 |
input:13150894
output:2602239
cacheread:344625015
|
$75.4527 |
$0.3827 / $1.72
|
v3.2.6 | 2025-10-24 | 2026-04-11 |
| 60 |
|
42.59 |
15.3%
|
32.9 | 52.3 | 53.6 | 86.9 | 50.1 | 56.9 | 55.4 | 50.9 | 58.4 | 41.9 | 107.1s | 86247300 |
input:85197273
output:1050027
|
$8.8452 |
$0.1 / $0.31
¥0.7 / ¥2.1
|
v3.2.6 | 2026-04-02 | 2026-04-02 |
| #22 |
|
49.19 |
17.8%
|
71.5 | 62.2 | 62.2 | 100 | 70.7 | 49.7 | 65.4 | 56.0 | 62.6 | 65.0 | 72.65s | 13317837 |
input:2946705
output:261948
cacheread:10109184
|
$0.8813 |
$0.1 / $0.31
¥0.7 / ¥2.1
|
v3.2.6 | 2026-04-02 | 2026-05-13 |
| 61 |
|
41.75 |
14.7%
|
31.1 | 51.8 | 53.0 | 86.1 | 50.1 | 58.8 | 54.3 | 49.8 | 57.3 | 40.0 | 103.62s | 87855843 |
input:86724853
output:1130990
|
$9.0118 |
$0.1 / $0.3
|
v3.2.6 | 2026-02-02 | 2026-04-04 |
| #27 |
|
47.49 |
20.1%
|
65.0 | 63.1 | 63.1 | 100 | 68.5 | 51.4 | 65.9 | 61.9 | 68.8 | 59.4 | 97.7s | 13508242 |
input:3178065
output:267201
cacheread:10062976
|
$0.9011 |
$0.1 / $0.3
¥0 / ¥0
|
v3.2.6 | 2026-02-02 | 2026-05-12 |
| 62 |
|
41.44 |
12.7%
|
39.3 | 48.4 | 49.0 | 90.9 | 46.6 | 55.6 | 50.1 | 45.3 | 53.6 | 39.4 | 242.33s | 32609723 |
input:31554432
output:1055291
|
$14.3483 |
$0.44 / $0.44
¥3 / ¥3
|
v3.2.6 | 2026-02-11 | 2026-04-07 |
| #34 |
|
44.83 |
31.8%
|
55.2 | 60.6 | 60.6 | 100 | 53.0 | 52.4 | 71.5 | 56.9 | 73.0 | 56.0 | 157.6s | 13212397 |
input:12780019
output:432378
|
$5.8135 |
$0.44 / $0.44
¥3 / ¥3
|
v3.2.6 | 2026-02-11 | 2026-05-16 |
| 63 |
|
38.74 |
9.5%
|
35.3 | 47.9 | 48.7 | 91.2 | 42.9 | 53.7 | 54.7 | 39.4 | 55.6 | 40.2 | 82.69s | 63192773 |
input:13974967
output:1641102
cacheread:47576704
|
$4.2687 |
$0.1 / $0.3
|
v3.2.6 | 2026-02-02 | 2026-04-01 |
| 64 |
|
34.74 |
9.2%
|
26.6 | 41.7 | 42.1 | 94.0 | 41.0 | 43.2 | 35.4 | 36.2 | 47.6 | 48.9 | 100.87s | 36613811 |
input:35477732
output:1136079
|
$5.8152 |
$0.1453 / $0.5812
¥1 / ¥4
|
v3.2.6 | 2025-03-21 | 2026-04-24 |
| 65 |
|
33.68 |
7.1%
|
26.0 | 42.9 | 43.0 | 97.5 | 29.1 | 52.1 | 44.0 | 40.8 | 54.6 | 41.0 | 115.4s | 32527395 |
input:9183126
output:467510
cacheread:22876759
|
$2.6896 |
$0.12 / $0.46
¥0.8 / ¥3.2
|
v3.2.6 | 2026-04-02 | 2026-04-05 |
| #28 |
|
47.25 |
29.5%
|
61.6 | 61.6 | 61.6 | 100 | 65.3 | 58.9 | 60.3 | 68.8 | 51.1 | 64.0 | 91.36s | 8680055 |
input:519490
output:102583
cacheread:8057982
|
$0.593 |
$0.12 / $0.46
¥0.8 / ¥3.2
|
v3.2.6 | 2026-04-02 | 2026-05-17 |
| 66 |
|
27.04 |
2.4%
|
28.9 | 35.8 | 36.0 | 99.0 | 37.4 | 43.0 | 29.2 | 31.0 | 44.7 | 30.9 | 54.98s | 26416550 |
input:12470152
output:114023
cacheread:13832375
|
$1.9728 |
$0.1 / $0.3
¥0.6 / ¥1.8
|
v3.2.6 | 2026-04-23 | 2026-04-23 |
| #1 |
|
62.75 |
47.2%
|
86.1 | 69.2 | 69.2 | 100.0 | 79.9 | 48.9 | 70.6 | 62.3 | 77.9 | 71.2 | 87.4s | 23896173 |
input:12596043
output:228187
cacheread:11071943
|
$0.0 |
$0 / $0
¥0 / ¥0
|
v3.2.6 | 2026-05-23 | 2026-05-23 |
Notes:
USD/CNY quick reference uses approximately
1($)≈ 6.8826(¥).
cache_read_tokens and cache_write_tokens are billed at 0.5x the input-token price.
cost_usd = (input_tokens + 0.5 * (cache_read_tokens + cache_write_tokens)) * price_usd_input / 1e6 + output_tokens * price_usd_output / 1e6
mimo-v2.5-pro and deepseek-v4-pro currently use 1M-context list pricing, so their displayed Cost is slightly overstated.
Closed dataset detail pages show model-level summaries only; they intentionally omit per-task scores and task-level rows.
pass^3 = weighted_pass_at_k_all, while strict_pass_rate = unweighted_pass_at_k_all.
Open Final Score = 100 × AvgScore^0.40 × ((Pass^3)^(1/3))^0.45 × (1 - (1 - Pass@3)^(1/3))^0.15
Closed Final Score = 100 × AvgScore^0.40 × ((Pass^3)^(1/3))^0.25 × (1 - (1 - Pass@3)^(1/3))^0.35
Visual Leaderboard
View the full leaderboard in chart form. When you switch metrics, the bar ranking below re-sorts by the selected field.
intern-s2-preview
ShangHai AILab · intern
Sensenova 6.7 Flash Lite
sensetime · sensenova
ERNIE 5.1
baidu · baiduqianfan
gpt-5.5-xhigh
openai · openai
gpt-5.4-xhigh
openai · openai
gpt-5.3-codex-xhigh
openai · openai
deepseek-v4-pro
deepseek · deepseek
qwen3.5-plus
qwen · bailiancodingplan
qwen3.5-397b-a17b
qwen · bailiantokenplan
mimo-v2.5-pro
xiaomi · xiaomi-token-plan
GLM-5.1
glm · glm
doubao-seed-2.0-code
seed · volcengine-plan
Ring 2.6 1T
inclusionai · openrouter
GLM-5-Turbo
glm · glm
deepseek-v4-flash
deepseek · deepseek
doubao-seed-2.0-pro
seed · volcengine-plan
Claude Sonnet 4.6
anthropic · openrouter
doubao-seed-2.0-lite
seed · volcengine-plan
mimo-v2.5
xiaomi · xiaomi-token-plan
qwen3.6-plus
qwen · bailiantokenplan
DeepSeek-V3.2
deepseek-ai · siliconflow
DeepSeek-V3.2
deepseek-ai · volcengine-plan
huanyuan-3.0-preview
tencent · tencent-token-plan
kimi-k2.6
moonshot · volcengine-plan
doubao-seed-code
seed · volcengine-plan
qwen3.6-plus
qwen · bailianapi
LongCat-2.0-Preview
meituan · longcat
qwen3.6-27b
qwen · bailiantokenplan
kimi-k2.5
moonshot · moonshot
DeepSeekV3.2
deepseek-ai · baiduqianfan
mimo-v2-pro
xiaomi · openrouter
mimo-v2-omni
xiaomi · openrouter
LongCat-Flash-Thinking-2601
meituan · longcat
Ling-2.6-1T
tbox · antling
qwen3.6-max-preview
qwen · bailiantokenplan
kimi-k2.6-code-preview
moonshot · moonshot
GLM-5
glm · glm
qwen3.6-35b-a3b
qwen · bailiantokenplan
qwen3.6-flash
qwen · bailiantokenplan
GLM-4.6
glm · glm
qwen3-max-2026-01-23
qwen · bailiancodingplan
kat-coder-pro-v2
kwaipilot · openrouter
GLM-4.7
glm · glm
gemini-3.1-pro-preview
google · openrouter
hunyuan-2.0-thinking
tencent · tencent-token-plan
MiniMax-M2.5
minimax · minimax
gemma-4-31b-it
google · openrouter
Ling-2.5-1T
tbox · antling
DeepSeek-R1
deepseek · siliconflow
MiniMax-M2.7
minimax · minimax
kimi-for-coding-k2.6
moonshot · kimicodingplan
gemini-3-flash-preview
google · openrouter
MiniMax-M2.1
minimax · minimax
Kimi-K2-Thinking
moonshot · siliconflow
hunyuan-2.0-instruct
tencent · tencent-token-plan
qwen3-coder-next
qwen · bailiancodingplan
mistral-small-2603
mistralai · openrouter
grok-4.20
x-ai · openrouter
kimi-for-coding-k2.5
moonshot · kimicodingplan
step-3.5-flash-2603
stepfun · stepfun
step-3.5-flash
stepfun · stepfun
Spark X2
xunfei · astroncodingplan
step-3.5-flash
stepfun · openrouter
hunyuan-t1
tencent · tencent-token-plan
ERNIE-4.5-Turbo
baidu · baiduqianfan
Ling-2.6-Flash
tbox · antling
Qwen3.7-Max
qwen · bailiantokenplan
doubao-seed-2.0-code
seed · volcengine-plan
kat-coder-pro-v2
kwaipilot · StremLake
gpt-5.3-codex-xhigh
openai · openai
qwen3.6-plus
qwen · bailiantokenplan
deepseek-v4-pro
deepseek · deepseek
gpt-5.5-xhigh
openai · openai
kimi-k2.6
moonshot · volcengine-plan
DeepSeek-V3.2
deepseek-ai · volcengine-plan
gpt-5.4-xhigh
openai · openai
GLM-5.1
glm · glm
MiniMax-M2.1
minimax · minimax
qwen3.5-plus
qwen · bailiancodingplan
GLM-5-Turbo
glm · glm
MiniMax-M2.5
minimax · minimax
mimo-v2.5-pro
xiaomi · xiaomi-token-plan
qwen3.6-max-preview
qwen · bailiantokenplan
doubao-seed-2.0-pro
seed · volcengine-plan
LongCat-2.0-Preview
meituan · longcat
Ring 2.6 1T
inclusionai · openrouter
huanyuan-3.0-preview
tencent · tencent-token-plan
step-3.5-flash-2603
stepfun · stepfun
mimo-v2-pro
xiaomi · xiaomi-token-plan
mimo-v2.5
xiaomi · xiaomi-token-plan
GLM-4.7
glm · glm
MiniMax-M2.7
minimax · minimax
step-3.5-flash
stepfun · stepfun
ERNIE-4.5-Turbo
baidu · baiduqianfancodingplan
intern-s2-preview
ShangHai AILab · intern
mimo-v2-omni
xiaomi · xiaomi-token-plan
deepseek-v4-flash
deepseek · deepseek
ERNIE 5.1
baidu · baiduqianfan
Sensenova 6.7 Flash Lite
sensetime · sensenova
Spark X2
xunfei · astroncodingplan
Score vs Runtime
Compare model scores against average runtime. The x-axis is Avg Runtime, and the y-axis can be switched across score fields.
Final Score
intern-s2-preview
Sensenova 6.7 Flash Lite
ERNIE 5.1
gpt-5.5-xhigh
gpt-5.4-xhigh
gpt-5.3-codex-xhigh
deepseek-v4-pro
qwen3.5-plus
qwen3.5-397b-a17b
mimo-v2.5-pro
GLM-5.1
doubao-seed-2.0-code
Ring 2.6 1T
GLM-5-Turbo
deepseek-v4-flash
doubao-seed-2.0-pro
Claude Sonnet 4.6
doubao-seed-2.0-lite
mimo-v2.5
qwen3.6-plus
DeepSeek-V3.2
DeepSeek-V3.2
huanyuan-3.0-preview
kimi-k2.6
doubao-seed-code
qwen3.6-plus
LongCat-2.0-Preview
qwen3.6-27b
kimi-k2.5
DeepSeekV3.2
mimo-v2-pro
mimo-v2-omni
LongCat-Flash-Thinking-2601
Ling-2.6-1T
qwen3.6-max-preview
kimi-k2.6-code-preview
GLM-5
qwen3.6-35b-a3b
qwen3.6-flash
GLM-4.6
qwen3-max-2026-01-23
kat-coder-pro-v2
GLM-4.7
gemini-3.1-pro-preview
hunyuan-2.0-thinking
MiniMax-M2.5
gemma-4-31b-it
Ling-2.5-1T
DeepSeek-R1
MiniMax-M2.7
kimi-for-coding-k2.6
gemini-3-flash-preview
MiniMax-M2.1
Kimi-K2-Thinking
hunyuan-2.0-instruct
qwen3-coder-next
mistral-small-2603
grok-4.20
kimi-for-coding-k2.5
step-3.5-flash-2603
step-3.5-flash
Spark X2
step-3.5-flash
hunyuan-t1
ERNIE-4.5-Turbo
Ling-2.6-Flash
Qwen3.7-Max
doubao-seed-2.0-code
kat-coder-pro-v2
gpt-5.3-codex-xhigh
qwen3.6-plus
deepseek-v4-pro
gpt-5.5-xhigh
kimi-k2.6
DeepSeek-V3.2
gpt-5.4-xhigh
GLM-5.1
MiniMax-M2.1
qwen3.5-plus
GLM-5-Turbo
MiniMax-M2.5
mimo-v2.5-pro
qwen3.6-max-preview
doubao-seed-2.0-pro
LongCat-2.0-Preview
Ring 2.6 1T
huanyuan-3.0-preview
step-3.5-flash-2603
mimo-v2-pro
mimo-v2.5
GLM-4.7
MiniMax-M2.7
step-3.5-flash
ERNIE-4.5-Turbo
intern-s2-preview
mimo-v2-omni
deepseek-v4-flash
ERNIE 5.1
Sensenova 6.7 Flash Lite
Spark X2
Avg. Seconds per Task
Score vs Cost
Compare model scores against benchmark cost. The x-axis is Cost, and the y-axis can be switched across score fields.
Final Score
intern-s2-preview
Sensenova 6.7 Flash Lite
ERNIE 5.1
gpt-5.5-xhigh
gpt-5.4-xhigh
gpt-5.3-codex-xhigh
deepseek-v4-pro
qwen3.5-plus
qwen3.5-397b-a17b
mimo-v2.5-pro
GLM-5.1
doubao-seed-2.0-code
Ring 2.6 1T
GLM-5-Turbo
deepseek-v4-flash
doubao-seed-2.0-pro
Claude Sonnet 4.6
doubao-seed-2.0-lite
mimo-v2.5
qwen3.6-plus
DeepSeek-V3.2
DeepSeek-V3.2
huanyuan-3.0-preview
kimi-k2.6
doubao-seed-code
qwen3.6-plus
LongCat-2.0-Preview
qwen3.6-27b
kimi-k2.5
DeepSeekV3.2
mimo-v2-pro
mimo-v2-omni
LongCat-Flash-Thinking-2601
Ling-2.6-1T
qwen3.6-max-preview
kimi-k2.6-code-preview
GLM-5
qwen3.6-35b-a3b
qwen3.6-flash
GLM-4.6
qwen3-max-2026-01-23
kat-coder-pro-v2
GLM-4.7
gemini-3.1-pro-preview
hunyuan-2.0-thinking
MiniMax-M2.5
gemma-4-31b-it
Ling-2.5-1T
DeepSeek-R1
MiniMax-M2.7
kimi-for-coding-k2.6
gemini-3-flash-preview
MiniMax-M2.1
Kimi-K2-Thinking
hunyuan-2.0-instruct
qwen3-coder-next
mistral-small-2603
grok-4.20
kimi-for-coding-k2.5
step-3.5-flash-2603
step-3.5-flash
Spark X2
step-3.5-flash
hunyuan-t1
ERNIE-4.5-Turbo
Ling-2.6-Flash
Qwen3.7-Max
doubao-seed-2.0-code
kat-coder-pro-v2
gpt-5.3-codex-xhigh
qwen3.6-plus
deepseek-v4-pro
gpt-5.5-xhigh
kimi-k2.6
DeepSeek-V3.2
gpt-5.4-xhigh
GLM-5.1
MiniMax-M2.1
qwen3.5-plus
GLM-5-Turbo
MiniMax-M2.5
mimo-v2.5-pro
qwen3.6-max-preview
doubao-seed-2.0-pro
LongCat-2.0-Preview
Ring 2.6 1T
huanyuan-3.0-preview
step-3.5-flash-2603
mimo-v2-pro
mimo-v2.5
GLM-4.7
MiniMax-M2.7
step-3.5-flash
ERNIE-4.5-Turbo
intern-s2-preview
mimo-v2-omni
deepseek-v4-flash
ERNIE 5.1
Sensenova 6.7 Flash Lite
Spark X2
Cost (USD)