Skip to main content
This page is the adoption-facing architecture view: what each command reads/writes, how data moves through batch vs online paths, and where deterministic safety gates apply.

System architecture at a glance

Data plane (state)

Supabase Postgres + pgvector stores items, embeddings, candidate snapshots, judge decisions, and close plans/results.

Decision plane

Embedding retrieval narrows candidates; LLM judgment proposes duplicate targets; deterministic policy gates accept/reject.

Action plane

Close actions are never direct from judge. They are gated through plan-close + explicit apply-close —yes.

Two coordinated pipelines

The batch path processes a repo corpus and produces reviewed close actions.
For a batch of issues/PRs, each stage iterates over many source items. Artifacts are persisted after each stage so runs are restartable and auditable.

Command-by-command information flow

This table is the operational truth for what each command reads and writes.
CommandPrimary inputReadsWritesOutput role
initlocal env/runtimeenv + local runtime checksnonereadiness checks only
maintainers--repoGitHub collaborators/permissionsnonemaintainer list for policy checks
syncrepo + type/state/sinceGitHub issues/PRsrepos (upsert), items (upsert + content hash/version)corpus ingest
refreshrepo + typeGitHub + existing itemsitems (discover new; optional metadata refresh)incremental freshness
embedrepo + typeitems needing vectorsembeddingssemantic retrieval substrate
candidatesrepo + type + k/min_scoreitems, embeddingscandidate_sets, candidate_set_members (plus stale marking)reproducible retrieval snapshot
judgerepo + type + provider/modelcandidate_sets + source/candidate item contextjudge_decisions (accepted/rejected/skipped)accepted-edge graph source
judge-auditsampled candidate setscandidate sets + same judge runtime (cheap/strong lanes)judge_audit_runs, judge_audit_run_itemsevaluation + disagreement analysis
report-auditaudit run idjudge_audit_runs, judge_audit_run_itemsnonereporting/simulation only
detect-newsingle item numberGitHub item + DB corpus embeddingsrepos (upsert), items (source upsert), embeddings (source if stale)online JSON verdict
canonicalizerepo + typeaccepted edges + item metadata + maintainer listnone (stats output)canonical cluster computation
plan-closerepo + type + min_closeaccepted edges + item metadata + maintainer listclose_runs(mode=plan), close_run_items (unless dry-run)reviewable close plan
apply-closeclose_run_id + --yesclose plan rows + GitHubclose_runs(mode=apply), close_run_items apply resultsexecuted mutations

Command-to-state map (Mermaid)

You’re usually best served by the table above for exact read/write behavior. These diagrams are intentionally simplified for readability.

Where safety decisions happen

Judge acceptance gate (batch)

Duplicate edges require valid structured response, candidate membership, confidence >= 0.85, open target, and score-gap >= 0.015.

Close planning gate

Close eligibility requires direct accepted edge to canonical, direct confidence >= 0.90, and maintainer author/assignee protections.

Apply mutation gate

No mutation happens without persisted mode=plan run and explicit —yes.

Online strict mapping

detect-new can downgrade high-confidence duplicate predictions to maybe_duplicate when structural/retrieval guardrails fail.

How everything is tied together

The core linkage is shared state and shared decision runtime:
  1. Shared corpus state (items + embeddings) feeds both batch and online paths.
  2. Shared duplicate reasoning runtime powers judge, judge-audit, and detect-new.
  3. Accepted-edge graph in judge_decisions is the bridge from detection to canonicalization and close planning.
  4. Close governance state (close_runs) gives auditable review/apply separation.
Operationally: run scheduled freshness (refresh/embed) so online detect-new stays accurate, and run batch canonicalization/plan/apply for governed close actions.