nyx-coder

It ships code
that actually works.

An autonomous coding agent that reads your repo, makes the change, runs the tests, and iterates until it is green, then opens the PR. Self-hosted, on any model, in a sandbox.

self-hostedany model verifiedsandboxed157 tests

Most agents write code and hope.

nyx-coder is built around verification. It does not call a change done because the model felt confident; it runs your repo's own test gate first, and inside Nyx it hands the result to a Reviewer that opens a real browser and checks it works on desktop and mobile. The edge is the verified, autonomous delivery pipeline, not the model, so you point it at a cheap model for routine work or a frontier one when quality matters, with a single env var.

What it does

A complete coding loop, confined and verifiable.

safe by default

Sandboxed

Every path and shell command is confined to the task workdir, with a policy that denies destructive or network commands.

small diffs

apply_patch

A codex-style patch channel with fuzzy and multi-occurrence matching, dry-run, and clear conflict reports.

long runs

Auto-compaction

Folds older turns into a summary so the context window never overflows on a big build.

ships it

Git + PR delivery

Diff summaries, auto-commit, and opt-in branch, push, and PR. Never pushes without your say-so.

teammate

GitHub issues

Watches a repo's issues, fixes them through the loop, and opens a PR that closes them. Opt-in, allowlisted.

any provider

Configurable models

planner / worker / vision / image / embedding, each swappable by env. Bring your own key.

How it works (the short version)

You hand it a task. It works a loop, in a sandbox, and does not stop until the tests pass.

sandbox: confined to the task workdir You: a taskor a GitHub issue Readunderstand the code Patchsmall change Run teststhe real gate Commit / PRonly when green green not green: fix it and try again runs on the Nyx spine, so if a step ever hangs it is skipped, never wedged
Read, patch, test in a loop; commit only when the gate is green; all inside a sandbox.

In plain terms: it reads the relevant files to understand the code, makes a small patch instead of a full-file rewrite, runs the tests, and if they fail it tries again with the failure in hand. Only when the gate is green does it commit or open a PR. Everything happens inside a confined workdir, so it cannot touch anything outside the task, and it runs on the Nyx spine, so if a step ever hangs the system skips it and moves on instead of getting stuck.

How it differs

vs most agents

It verifies

Others write code and stop. nyx-coder runs your real test gate, and a browser review, before calling anything done.

vs IDE assistants

It is autonomous

Not a pair-programmer waiting on you. Hand it a goal or an issue and walk away; it finishes and opens a PR.

vs cloud tools

It is yours

Self-hosted, your machine, your models. No code leaves your box and the cost is just your API key.

vs one-model tools

Any model

Swap the model per role with an env var. Cheap for routine work, frontier when it matters.

Lightweight on purpose

The heavy compute (the model) lives in the API. The agent itself is small: about 5,000 lines, with zero runtime native dependencies.

nyx-coder core ~5k LOC, 0 native deps a terminal CLI agent tens of thousands (approx) a full agent platform 100k+ (approx) For scale, approximate. The model is an API call, not code you ship.
nyx-coder's core is a fraction of a full coding-agent platform, because the model is an API call and there are no native deps to drag around.

Quick start

Zero runtime native deps. The test suite is network-free, so it runs with no key.

# clone, then
npm install
npm test                 # 157 tests, no key needed

cp .env.example .env      # add OPENROUTER_API_KEY
npm run cli -- "add a sum() function in src/math.ts with a test"

Commands and config

There is no interactive prompt and no slash-commands like /clear or /model. nyx-coder is non-interactive on purpose: you drive it two ways, a one-shot CLI or a small HTTP surface, and you pick the models by env, not by command.

# one-shot CLI: a task in, verified code out
npm run cli -- "task description" --workdir . --tier fast --max 30
#   --workdir   confine every file and command to this dir   (default: .)
#   --tier      fast = worker model, smart = planner model    (default: smart)
#   --max       step ceiling before it halts                  (default: 30)

npm test          # run the 157-test gate (no key needed)
npm run typecheck # tsc --noEmit

# or drive it over HTTP (src/server.ts, bound on 127.0.0.1)
POST /runs  {"goal":"...","workdir":"."}  # start a run
GET  /runs             # list runs
GET  /runs/:id         # one run plus its result
GET  /runs/:id/events  # its full event trace
GET  /health           # { ok: true }

# choose the model per role, any OpenAI-compatible provider
NYX_MODEL_PLANNER=deepseek/deepseek-v4-pro     # high-level reasoning
NYX_MODEL_WORKER=deepseek/deepseek-v4-flash    # routine, low-level steps
NYX_MODEL_VISION=qwen/qwen-2.5-vl-72b-instruct # screenshots, design refs
NYX_MODEL_IMAGE=black-forest-labs/flux-1.1-pro # image generation

Why no REPL? nyx-coder is built to be driven by Nyx, not typed at. The orchestrator hands it a goal and walks away, so the surface is a single call that runs to a verified result. A human-facing interactive mode with slash-commands is a clean future add, not something it needs to do its job.

Part of Nyx

nyx-coder is the coding agent inside Nyx, a self-hostable autonomous-agent system: an orchestrator that runs a team of doers (coder, researcher, reviewer) on a self-healing spine that cannot wedge. The assistant routes and aggregates; nyx-coder does the building.