Rank排名 23 OpenCode MiniMax

MiniMax M2.7 highspeed

A lower-table result with a few useful bright spots: 38/151 tasks solved at least once, 15/151 solved in all three attempts, with the clearest wins around large Python/Django application repairs plus automation and configuration-management work. 这是一个排名靠后但仍有局部亮点的结果:151 题中至少一次解出 38 题,三次都解出 15 题;强项主要落在大型 Python/Django 应用修复以及自动化和配置管理类改动。

opencode-cli 1.14.32 minimax-cn-coding-plan/MiniMax-M2.7-highspeed Updated更新 2026-06-18

How to read this result可以这样读

  • MiniMax M2.7 highspeed ranks #23 with a 25.24 Final Score. The headline is 38 reached tasks, but the stability number is 15 pass-in-all-three tasks.MiniMax M2.7 highspeed 排名 #23,Final Score 为 25.24。表面信号是 38 道题至少成功一次,稳定性信号是 15 道题三次都成功。
  • The strongest evidence clusters around large Python/Django application repairs plus automation and configuration-management work.最强证据集中在大型 Python/Django 应用修复以及自动化和配置管理类改动。
  • The failure shape is mostly Go product plumbing across configuration, storage, and service APIs plus automation and configuration-management work.失败形态主要是横跨配置、存储和服务 API 的 Go 产品工程以及自动化和配置管理类改动。
  • The OpenCode run is more sensitive to the underlying model family: the same harness can look sharp or brittle depending on where the model puts its search budget.OpenCode 这组更能体现底层模型家族差异:同一套 harness 下,模型如何分配搜索预算会直接决定它显得锋利还是脆弱。

MiniMax M2.7 highspeed is best read through the gap between reach and repeatability. It reaches 38/151 tasks at least once, but 15/151 tasks survive all three attempts. That gap is the personality of the row: the model can find solutions across a fairly wide surface, but the dependable core is narrower than the headline Pass@3 number.

In leaderboard terms, rank #23 and a 25.24 Final Score put it in direct comparison with nearby models, but the more useful question is where the wins come from. In this run the strongest signal is large Python/Django application repairs plus automation and configuration-management work; the weak side is Go product plumbing across configuration, storage, and service APIs plus automation and configuration-management work. The OpenCode run is more sensitive to the underlying model family: the same harness can look sharp or brittle depending on where the model puts its search budget.

Selected high and low suites, grouped by pass-at-least-once rate.选取高分和低分 suite,按三次尝试至少解出一次的比例展示。
Open Library · release 013Open Library · release 013 6/10 · 60.0%

Best visible cluster for this row: 6/10 tasks reached.这一行最明显的强项簇:10 题中解出 6 题。

vuls · release 010vuls 漏洞扫描器 · release 010 5/10 · 50.0%
Open Library · release 015Open Library · release 015 5/10 · 50.0%
Open Library · release 014Open Library · release 014 4/10 · 40.0%
Open Library · release 016Open Library · release 016 2/5 · 40.0%
Ansible · release 004Ansible 自动化 · release 004 1/3 · 33.3%
Flipt · release 006Flipt feature flag 服务 · release 006 0/10 · 0.0%

Weak cluster: Go product plumbing across configuration, storage, and service APIs resisted this model-agent pairing.弱项簇:横跨配置、存储和服务 API 的 Go 产品工程对这个模型-agent 组合不友好。

Flipt · release 008Flipt feature flag 服务 · release 008 0/10 · 0.0%

Weak cluster: Go product plumbing across configuration, storage, and service APIs resisted this model-agent pairing.弱项簇:横跨配置、存储和服务 API 的 Go 产品工程对这个模型-agent 组合不友好。

qutebrowser · release 018qutebrowser 浏览器 · release 018 0/9 · 0.0%

Weak cluster: browser/runtime integration around QtWebEngine behavior resisted this model-agent pairing.弱项簇:围绕 QtWebEngine 行为的浏览器/runtime 集成对这个模型-agent 组合不友好。

Navidrome · release 017Navidrome 音乐服务 · release 017 0/5 · 0.0%

Weak cluster: Go service work with persistence and API behavior resisted this model-agent pairing.弱项簇:涉及持久化和 API 行为的 Go 服务改动对这个模型-agent 组合不友好。

The suite chart is the fastest way to read the model. High bars mean the agent repeatedly found the right subsystem and produced patches the verifier accepted at least once. Low bars are not just misses; they are hints about the task shape that made the model overfit a local edit, stop before the second-order consumer, or fail to keep a multi-package change coherent.

The case notes above keep the article grounded in individual SWE-Bench-Pro instances. A stable 3/3 solve means the task is inside the model’s dependable operating region. A 1/3 solve means it can reach the idea, but the path is retry-sensitive. A 0/3 miss is more diagnostic: it marks a task shape where this model-agent pairing did not find a verifier-backed patch in three independent attempts.

The verifier audit block below is included because this row has re-verification data.

Original harness result vs verifier-backed audit sample原始 harness 结果 vs verifier-backed 复核样本
74 of 76 headline successes survived strict re-verification. 76 次初始成功里,74 次通过了更严格的复核。

The available audit keeps 74 of 76 initial solved attempts. Read this as a robustness check, especially when the audit sample is smaller than 453 attempts.当前可用复核保留了 76 次初始成功中的 74 次。这更适合作为稳健性检查,特别是在复核样本小于 453 次尝试时。

74 verifier-backed复核通过 2 strict rejected严格拒绝
25.24 25.24 +0.00 points+0.00 分

For practical use, I would treat MiniMax M2.7 highspeed as strongest when the task resembles the high-performing suites and weaker when it resembles the low-performing suites. The raw attempt score is 76/453; that is enough signal to compare it with neighboring rows, but not enough to assume the same behavior on every repository family.

Supporting suite table
Suite Repo Solved Pass^3 Rate
release-zh-013-internetarchive-openlibrary internetarchive/openlibrary 6/10 5 60.0%
release-zh-010-future-architect-vuls future-architect/vuls 5/10 2 50.0%
release-zh-015-internetarchive-openlibrary internetarchive/openlibrary 5/10 1 50.0%
release-zh-014-internetarchive-openlibrary internetarchive/openlibrary 4/10 2 40.0%
release-zh-016-internetarchive-openlibrary internetarchive/openlibrary 2/5 1 40.0%
release-zh-004-ansible-ansible ansible/ansible 1/3 1 33.3%
release-zh-006-flipt-io-flipt flipt-io/flipt 0/10 0 0.0%
release-zh-008-flipt-io-flipt flipt-io/flipt 0/10 0 0.0%
release-zh-018-qutebrowser-qutebrowser qutebrowser/qutebrowser 0/9 0 0.0%
release-zh-017-navidrome-navidrome navidrome/navidrome 0/5 0 0.0%

读 MiniMax M2.7 highspeed,最有用的是看“覆盖能力”和“重复稳定性”的差距。它在 151 题中至少一次解出 38 题,但三次尝试都解出的只有 15 题。这个差距就是这一行的性格:模型能在相当宽的任务面上摸到解法,但真正可靠的核心比 Pass@3 的表面数字更窄。

从排行榜数字看,排名 #23、Final Score 25.24 让它可以和附近模型直接比较;但更重要的问题是胜利来自哪里。这次运行最强的信号在大型 Python/Django 应用修复以及自动化和配置管理类改动,弱侧则主要是横跨配置、存储和服务 API 的 Go 产品工程以及自动化和配置管理类改动。OpenCode 这组更能体现底层模型家族差异:同一套 harness 下,模型如何分配搜索预算会直接决定它显得锋利还是脆弱。

Selected high and low suites, grouped by pass-at-least-once rate.选取高分和低分 suite,按三次尝试至少解出一次的比例展示。
Open Library · release 013Open Library · release 013 6/10 · 60.0%

Best visible cluster for this row: 6/10 tasks reached.这一行最明显的强项簇:10 题中解出 6 题。

vuls · release 010vuls 漏洞扫描器 · release 010 5/10 · 50.0%
Open Library · release 015Open Library · release 015 5/10 · 50.0%
Open Library · release 014Open Library · release 014 4/10 · 40.0%
Open Library · release 016Open Library · release 016 2/5 · 40.0%
Ansible · release 004Ansible 自动化 · release 004 1/3 · 33.3%
Flipt · release 006Flipt feature flag 服务 · release 006 0/10 · 0.0%

Weak cluster: Go product plumbing across configuration, storage, and service APIs resisted this model-agent pairing.弱项簇:横跨配置、存储和服务 API 的 Go 产品工程对这个模型-agent 组合不友好。

Flipt · release 008Flipt feature flag 服务 · release 008 0/10 · 0.0%

Weak cluster: Go product plumbing across configuration, storage, and service APIs resisted this model-agent pairing.弱项簇:横跨配置、存储和服务 API 的 Go 产品工程对这个模型-agent 组合不友好。

qutebrowser · release 018qutebrowser 浏览器 · release 018 0/9 · 0.0%

Weak cluster: browser/runtime integration around QtWebEngine behavior resisted this model-agent pairing.弱项簇:围绕 QtWebEngine 行为的浏览器/runtime 集成对这个模型-agent 组合不友好。

Navidrome · release 017Navidrome 音乐服务 · release 017 0/5 · 0.0%

Weak cluster: Go service work with persistence and API behavior resisted this model-agent pairing.弱项簇:涉及持久化和 API 行为的 Go 服务改动对这个模型-agent 组合不友好。

suite 图是最快的读法。高柱子说明 agent 能反复找到正确子系统,并至少一次产出 verifier 接受的补丁。低柱子不只是失败列表,它们提示了让模型过拟合局部编辑、漏掉第二层消费者,或无法维持跨包改动一致性的任务形状。

上面的案例把文章拉回到具体 SWE-Bench-Pro instance。3/3 稳定通过说明任务落在模型可靠区;1/3 说明它能摸到思路,但路径依赖重试;0/3 则更有诊断价值,表示这个模型-agent 组合三次独立尝试都没有找到 verifier-backed patch。

下面保留 verifier audit 模块,因为这一行有复核数据。

Original harness result vs verifier-backed audit sample原始 harness 结果 vs verifier-backed 复核样本
74 of 76 headline successes survived strict re-verification. 76 次初始成功里,74 次通过了更严格的复核。

The available audit keeps 74 of 76 initial solved attempts. Read this as a robustness check, especially when the audit sample is smaller than 453 attempts.当前可用复核保留了 76 次初始成功中的 74 次。这更适合作为稳健性检查,特别是在复核样本小于 453 次尝试时。

74 verifier-backed复核通过 2 strict rejected严格拒绝
25.24 25.24 +0.00 points+0.00 分

实际使用时,我会把 MiniMax M2.7 highspeed 用在更接近高分 suite 的任务上;如果任务形态接近低分 suite,就要更谨慎。它的单次尝试成功数是 76/453,足够用来和邻近模型比较,但不足以推断它在所有 repository family 上都会保持同样表现。

支撑这个判断的 suite 表
Suite Repo 解出 Pass^3 通过率
release-zh-013-internetarchive-openlibrary internetarchive/openlibrary 6/10 5 60.0%
release-zh-010-future-architect-vuls future-architect/vuls 5/10 2 50.0%
release-zh-015-internetarchive-openlibrary internetarchive/openlibrary 5/10 1 50.0%
release-zh-014-internetarchive-openlibrary internetarchive/openlibrary 4/10 2 40.0%
release-zh-016-internetarchive-openlibrary internetarchive/openlibrary 2/5 1 40.0%
release-zh-004-ansible-ansible ansible/ansible 1/3 1 33.3%
release-zh-006-flipt-io-flipt flipt-io/flipt 0/10 0 0.0%
release-zh-008-flipt-io-flipt flipt-io/flipt 0/10 0 0.0%
release-zh-018-qutebrowser-qutebrowser qutebrowser/qutebrowser 0/9 0 0.0%
release-zh-017-navidrome-navidrome navidrome/navidrome 0/5 0 0.0%