Skip to main content
This page is the adoption-facing architecture view: what each command reads/writes, how data moves through batch vs online paths, and where deterministic safety gates apply.

System architecture at a glance

Data plane (state)

Supabase Postgres + pgvector stores items, embeddings, candidate snapshots, judge decisions, and close plans/results.

Decision plane

Embedding retrieval narrows candidates; LLM judgment proposes duplicate targets; deterministic policy gates accept/reject.

Action plane

Close actions are never direct from judge. They are gated through plan-close + explicit apply-close —yes.

Two coordinated pipelines

The batch path processes a repo corpus and produces reviewed close actions.
For a batch of issues/PRs, each stage iterates over many source items. Artifacts are persisted after each stage so runs are restartable and auditable.
If you prefer raw embeddings, skip analyze-intent and add --source raw to embed, candidates, and judge.

Command-by-command information flow

This table is the operational truth for what each command reads and writes.
Default --source is now intent for source-aware commands. When --source raw is selected or fallback is triggered, replace intent-card/intent-embedding paths with the raw embeddings equivalents.
CommandPrimary inputReadsWritesOutput role
initlocal env/runtimeenv + local runtime checksnonereadiness checks only
llmlocal CLI metadataCLI command tree + settings defaultsnonemachine-readable CLI reference
maintainers--repoGitHub collaborators/permissionsnonemaintainer list for policy checks
syncrepo + type/state/sinceGitHub issues/PRsrepos (upsert), items (upsert + content hash/version)corpus ingest
refreshrepo + typeGitHub + existing itemsitems (discover new; optional metadata refresh)incremental freshness
analyze-intentrepo + type + state/only-changeditemsintent_cardsintent extraction substrate
embedrepo + typeintent_cards (default) / items (raw)intent_embeddings (default) / embeddings (raw)semantic retrieval substrate
candidatesrepo + type + k/min_scoreitems, intent_embeddings/embeddingscandidate_sets, candidate_set_members (plus stale marking)reproducible retrieval snapshot
judgerepo + type + provider/modelcandidate_sets + source/candidate item contextjudge_decisions (accepted/rejected/skipped)accepted-edge graph source
judge-auditsampled candidate setscandidate sets + same judge runtime (cheap/strong lanes)judge_audit_runs, judge_audit_run_itemsevaluation + disagreement analysis
report-auditaudit run idjudge_audit_runs, judge_audit_run_itemsnonereporting/simulation only
detect-newsingle item numberGitHub item + DB corpus intent/raw embeddingsrepos (upsert), items (source upsert), intent_cards + intent_embeddings (intent default), embeddings (raw fallback)online JSON verdict
searchrepo + query/similar-toitems, intent_embeddings/embeddingsnoneread-only semantic discovery
canonicalizerepo + typeaccepted edges + item metadata + maintainer listnone (stats output)canonical cluster computation
plan-closerepo + type + min_close + target_policyaccepted edges + item metadata + maintainer listclose_runs(mode=plan), close_run_items (unless dry-run)reviewable close plan
apply-closeclose_run_id + --yesclose plan rows + GitHubclose_runs(mode=apply), close_run_items apply resultsexecuted mutations

Command-to-state map (Mermaid)

You’re usually best served by the table above for exact read/write behavior. These diagrams are intentionally simplified for readability.

Where safety decisions happen

Judge acceptance gate (batch)

Duplicate edges require valid structured response, candidate membership, confidence >= 0.85, open target, and score-gap >= 0.015.

Close planning gate

Default close policy requires direct accepted edge to canonical (—target-policy canonical-only); optional direct-fallback can use the source item’s direct accepted target when canonical evidence is missing. Confidence stays >= 0.90 and maintainer protections still apply.

Apply mutation gate

No mutation happens without persisted mode=plan run and explicit —yes.

Online strict mapping

detect-new can downgrade high-confidence duplicate predictions to maybe_duplicate when structural/retrieval guardrails fail.

How everything is tied together

The core linkage is shared state and shared decision runtime:
  1. Shared corpus state (items, intent_cards, intent_embeddings/embeddings) feeds both batch and online paths.
  2. Shared duplicate reasoning runtime powers judge, judge-audit, and detect-new.
  3. Accepted-edge graph in judge_decisions is the bridge from detection to canonicalization and close planning.
  4. Close governance state (close_runs) gives auditable review/apply separation.
Operationally: run scheduled freshness (refresh/embed) so online detect-new stays accurate, and run batch canonicalization/plan/apply for governed close actions.