Software Engineering · SWE-bench Pro

Frontier-class coding on 100% local models. No tokens purchased.

NodePlus scores 66.6% on SWE-bench Pro, a benchmark of real software-engineering tasks across four languages.

Here is the part that matters: every system that scores higher on the public leaderboard is a closed, paid, cloud model. NodePlus runs entirely on local models, with no per-token bills and no third-party AI provider in the loop, and still lands among the leaders.

The score is not the point. NodePlus is a working system that writes and ships code, repairs bugs, and builds features against your stack. We took the test to show the engine is real.

Request a briefing See where it lands

SWE-bench Pro overall

66.6%

731 instances · 4 languages

81.6%

on the 597 instances that ran cleanly

tokens purchased, run end to end on local hardware

100% local models

No tokens purchased

No third-party AI provider sees your code

§ IWhat It's Actually For

We didn’t build NodePlus to win a benchmark. We built it to do the engineering work.

SWE-bench Pro measures one thing: can the system resolve real issues in real repositories. We ran it to prove the coding engine holds up under pressure.

Writes & ships code

Generates production code against your stack and conventions, then opens the pull request for review.

Repairs bugs

Traces failures to root cause across the codebase and proposes the minimal, targeted fix.

Builds new features

Takes a request from spec to a working, reviewed change, with the right context already loaded.

Works across languages

Tested on Go, Python, TypeScript, and JavaScript repositories, not a single-language demo.

No metered usage

The whole pipeline runs on local models that NodePlus operates. No external API, no per-token metering, no surprise bill.

Keeps your code private

Source, prompts, and context are processed only by local models. Nothing is ever sent to a third-party AI provider.

§ IIWhere It Lands

The highest SWE-bench Pro result you can run without paying for a single token.

On the public 35-model leaderboard, only two systems score higher than NodePlus, and both are closed, paid, cloud models. Every open-weights model on the board scores lower.

Best closed, paid model

Closed · Paid API

77.8%

Closed, paid frontier model

Closed · Paid API

69.2%

NodePlus

Local · $0 tokens

66.6%

Best open model, run on its own

Open · self-host

59.0%

Reference figures from the public SWE-bench Pro leaderboard (35 models). Competing systems shown by tier; NodePlus from run pro_731_v2.

2 of 35

systems score higher, both closed and paid

open-weights models on the board score higher

spent on tokens, no metered provider bills

§ IIIInside The Results

Strong across the board, led by 98% on Python.

Scores below are on the instances that executed cleanly. Across all of them the system resolved 81.6% of issues, with the largest gains on the hardest languages.

Python98.0%

193 / 197 resolved+11.7pp from the Gateway

TypeScript88.9%

16 / 18 resolved+38.9pp from the Gateway

Go84.3%

199 / 236 resolved+19.9pp from the Gateway

JavaScript repositories are the current frontier at 43.9% (64 / 146), up +20.6pp from raw retrieval, and the focus of ongoing indexer work. A further 134 instances across the hardest repositories could not be scored at all, because the benchmark harness fetched shallow clones missing the target commits; those are infrastructure failures, not quality results, and are excluded from the rates above.

§ IVHow We Got Here

The model is the small part. The Gateway is what wins.

NodePlus pairs a local model with a structured retrieval and memory pipeline. The same hardware, without that pipeline, scores far lower.

QUBO Structural Indexer

Maps the repository into a structured candidate set so the right files surface before any code is generated.

NodePlus Gateway

Enriches each task with RAG retrieval, a BM25 keyword union, a canonical-facts layer, and an associative memory lattice.

Associative Memory

Long-term, cross-file knowledge that surfaces non-obvious connections a flat search would miss.

100% Local Models

Generation runs entirely on open models that NodePlus hosts, with a local embedding model and no external API calls.

The Gateway adds +16.6 points over raw retrieval, and up to +39 points on the hardest languages. A local model that sits mid-pack on its own is lifted past frontier paid systems, with no tokens purchased.

§ V · Local By Design

Frontier coding, without handing your code to a third-party AI provider.

Every result on this page was produced on 100% local models with no tokens purchased. For regulated labs, financial services, and compliance-heavy operations, that means proprietary code and customer data are never sent to a third-party AI provider, and costs do not scale with usage.

You get a SWE-bench Pro result that outscores most paid frontier APIs, with no third-party model in the loop, at a fixed and predictable cost.

No tokens purchasedNo third-party AIFully managedFixed costRegulated-ready

§ VIThe Benchmark

What SWE-bench Pro actually measures

SWE-bench Pro evaluates whether an AI system can resolve real, verified issues drawn from production open-source repositories. Each instance is a genuine engineering task: the system must read the codebase, locate the problem, and produce a patch that passes the project’s own tests.

This run covered 731 instances across four languages and eleven repositories. Of those, 134 could not be scored because of infrastructure failures: shallow clones missing the target commit on the hardest repositories, plus a handful of errors. On the 597 instances that executed cleanly, NodePlus resolved 81.6%. The headline 66.6% counts every non-valid instance as a miss.

731

Instances

Languages

Repositories

Anti-contamination controls blocked the benchmark from writing into or retrieving from production memory, so no result reflects prior exposure to the test set.

§ VIIWhat This Means For Your Business

Production-grade code generation and bug fixes, validated against a public engineering benchmark.

Results that outscore most paid frontier APIs, with no per-token cost.

Your source code, prompts, and context are never sent to a third-party AI provider.

Predictable, fixed cost: the bill does not grow every time the team ships.

Frontier-class coding, on local models, with no tokens purchased

See NodePlus ship code, fix bugs, and build features against your stack, and book a briefing on the local pipeline behind the 66.6% result.

Request a briefing See the memory benchmark