Forge

Autonomous engineering needs judgment, not just code generation.

Forge is a local-first Rust framework that runs AI coding agents like Codex CLI, Claude Code, and Gemini CLI inside a governed engineering loop. It schedules work, routes tasks, remembers decisions, monitors quality, and delivers changes through git.

Single Rust binary SQLite memory Executor-agnostic Git-native

Core thesis

Autonomous engineering systems need scheduling, policy, memory, monitoring, and review. The coding model is only one executor in a larger operating system for repo work.

Operating model

Forge behaves like a small autonomous engineering organization: it discovers useful work, executes bounded tasks, evaluates the output, and turns failures into follow-up work.

Build Loop

Runs every 10-30 minutes. Selects queued or discovered work, enforces policy, invokes an executor, captures artifacts, and delivers the result as a PR or direct commit depending on mode.

Monitor Loop

Runs hourly and after builds. Validates tests, functionality, UX, product fit, and repo health. Rejected work creates follow-up tasks instead of pretending the job is done.

Architecture

Scheduler, coordinator, queue, memory, policies, executors, monitor, git delivery.

Scheduler owns timed execution. Coordinator selects work and acquires locks. Queue stores durable SQLite tasks. Policies constrain autonomy and blast radius. Executors wrap external coding CLIs. Monitor judges quality and creates follow-ups.

Delivery modes

PR mode is the default path: create a branch, open a PR, and wait for approval or monitor merge. Insane mode can commit directly to main while still running guardrails.

Autonomy levels

Forge can suggest tasks, enqueue them automatically, or execute them autonomously. The level is explicit policy, not implicit agent behavior.

Memory

SQLite stores episodic run logs, semantic repository knowledge, and organizational standards such as protected paths and testing expectations.

Run artifacts

Every run writes summaries, logs, diff notes, test results, screenshots, monitor reports, and generated follow-up tasks.

Safety

Locks, cooldowns, budgets, forbidden commands, protected paths, and maximum diff limits keep autonomous work bounded.

Dogfooding

Forge should operate on itself and on sandbox apps like todo, kanban, dashboard, CRM, and recipe planner projects to generate regression data.

v1 scope

Rust CLI with a single local binary.
SQLite-backed task queue, memory, runs, and artifacts.
Build and monitor schedules with policy enforcement.
Executor adapters starting with Codex CLI.
PR-mode delivery with logs, summaries, and review output.

Success criteria

Discovers useful repo work instead of generating churn.
Produces reviewable changes with mechanical proof.
Turns rejected work into targeted follow-up tasks.
Maintains repo health over long-running autonomous loops.
Improves prompts, policies, and workflows safely.

CLI shape forge init forge heartbeat forge run <task> forge monitor forge status forge explain