This page is the adoption-facing architecture view: what each command reads/writes, how data moves through batch vs online paths, and where deterministic safety gates apply.
System architecture at a glance
Data plane (state)
Supabase Postgres + pgvector stores items, embeddings, candidate snapshots, judge decisions, and close plans/results.
Decision plane
Embedding retrieval narrows candidates; LLM judgment proposes duplicate targets; deterministic policy gates accept/reject.
Action plane
Close actions are never direct from judge. They are gated through
plan-close + explicit apply-close —yes.Two coordinated pipelines
- Batch pipeline (many items)
- Online pipeline (detect-new)
The batch path processes a repo corpus and produces reviewed close actions.
For a batch of issues/PRs, each stage iterates over many source items. Artifacts are persisted after each stage so runs are restartable and auditable.
If you prefer raw embeddings, skip
analyze-intent and add --source raw to embed, candidates, and judge.Command-by-command information flow
Default
--source is now intent for source-aware commands. When --source raw is selected or fallback is triggered, replace intent-card/intent-embedding paths with the raw embeddings equivalents.| Command | Primary input | Reads | Writes | Output role |
|---|---|---|---|---|
init | local env/runtime | env + local runtime checks | none | readiness checks only |
llm | local CLI metadata | CLI command tree + settings defaults | none | machine-readable CLI reference |
maintainers | --repo | GitHub collaborators/permissions | none | maintainer list for policy checks |
sync | repo + type/state/since | GitHub issues/PRs | repos (upsert), items (upsert + content hash/version) | corpus ingest |
refresh | repo + type | GitHub + existing items | items (discover new; optional metadata refresh) | incremental freshness |
analyze-intent | repo + type + state/only-changed | items | intent_cards | intent extraction substrate |
embed | repo + type | intent_cards (default) / items (raw) | intent_embeddings (default) / embeddings (raw) | semantic retrieval substrate |
candidates | repo + type + k/min_score | items, intent_embeddings/embeddings | candidate_sets, candidate_set_members (plus stale marking) | reproducible retrieval snapshot |
judge | repo + type + provider/model | candidate_sets + source/candidate item context | judge_decisions (accepted/rejected/skipped) | accepted-edge graph source |
judge-audit | sampled candidate sets | candidate sets + same judge runtime (cheap/strong lanes) | judge_audit_runs, judge_audit_run_items | evaluation + disagreement analysis |
report-audit | audit run id | judge_audit_runs, judge_audit_run_items | none | reporting/simulation only |
detect-new | single item number | GitHub item + DB corpus intent/raw embeddings | repos (upsert), items (source upsert), intent_cards + intent_embeddings (intent default), embeddings (raw fallback) | online JSON verdict |
search | repo + query/similar-to | items, intent_embeddings/embeddings | none | read-only semantic discovery |
canonicalize | repo + type | accepted edges + item metadata + maintainer list | none (stats output) | canonical cluster computation |
plan-close | repo + type + min_close + target_policy | accepted edges + item metadata + maintainer list | close_runs(mode=plan), close_run_items (unless dry-run) | reviewable close plan |
apply-close | close_run_id + --yes | close plan rows + GitHub | close_runs(mode=apply), close_run_items apply results | executed mutations |
Command-to-state map (Mermaid)
You’re usually best served by the table above for exact read/write behavior. These diagrams are intentionally simplified for readability.
- Core batch path
- Online + evaluation + integrations
Where safety decisions happen
Judge acceptance gate (batch)
Duplicate edges require valid structured response, candidate membership,
confidence >= 0.85, open target, and score-gap >= 0.015.Close planning gate
Default close policy requires direct accepted edge to canonical (
—target-policy canonical-only); optional direct-fallback can use the source item’s direct accepted target when canonical evidence is missing. Confidence stays >= 0.90 and maintainer protections still apply.Apply mutation gate
No mutation happens without persisted
mode=plan run and explicit —yes.Online strict mapping
detect-new can downgrade high-confidence duplicate predictions to maybe_duplicate when structural/retrieval guardrails fail.How everything is tied together
The core linkage is shared state and shared decision runtime:- Shared corpus state (
items,intent_cards,intent_embeddings/embeddings) feeds both batch and online paths. - Shared duplicate reasoning runtime powers
judge,judge-audit, anddetect-new. - Accepted-edge graph in
judge_decisionsis the bridge from detection to canonicalization and close planning. - Close governance state (
close_runs) gives auditable review/apply separation.
Operationally: run scheduled freshness (
refresh/embed) so online detect-new stays accurate, and run batch canonicalization/plan/apply for governed close actions.