Overview

dupcanon is currently human-operated, but it is being shaped for workflow-native automation in GitHub Actions.

What this is

Primary objective

Reduce duplicate noise in high-cadence repos (for example, OpenClaw) by selecting stable canonicals and closing true duplicates safely.

Why it exists

Naive automation creates duplicate chains and unstable targets; this system adds durable state + deterministic gates.

Operating constraints

Cheap enough to run continuously, accurate enough for guarded production use, and fully auditable.

Core strategy

Embeddings for retrieval, LLM for semantic judgment, deterministic policy for acceptance and close safety.

Current vs target mode

Current mode
Target mode

Human-operated CLI pipeline
Online detect-new in shadow/suggest mode
Close actions gated through reviewed plan-close + explicit apply-close --yes

Workflow-driven automation in GitHub Actions
detect-new as the online entrypoint for new issues/PRs
Same deterministic guardrails, with policy-based promotion from shadow to action

Core approach

Two paths share one data model and one safety model.

Online entrypoint: detect-new for newly opened issues/PRs
sync / refresh: ingest issues + PRs into Postgres
embed: title/body embeddings in pgvector
candidates: persisted nearest-neighbor candidate sets
judge: LLM chooses duplicate target inside the candidate set
deterministic gates veto risky decisions
canonicalize: compute canonical representatives
plan-close -> reviewed apply-close --yes

All key artifacts are persisted for replay, audit, and threshold tuning.

Deterministic gates (with actual thresholds)

Judge acceptance gates (batch judge)

strict JSON parse
target must be in candidate set
confidence threshold: model_confidence >= 0.85 (min_edge default)
target must be open
candidate gap gate: selected_score - best_alternative_score >= 0.015
mismatch vetoes (uncertain/overlap/root-cause/scope class failures)
one accepted outgoing edge per source unless explicit rejudge

Close planning gates

maintainer author protection
maintainer assignee protection
direct accepted edge to canonical required
close threshold: direct edge confidence >= 0.90 (min_close default)

Online detect-new gates

default thresholds: maybe=0.85, duplicate=0.92
strict duplicate downgrade if structural guardrails fail
duplicate class also requires strong retrieval support (current floor: top match score >= 0.90)
candidate gap gate also applies on strict duplicate path (>= 0.015)

Confidence vs score (important):

confidence = model’s self-reported duplicate confidence in [0,1]
score = retrieval similarity score from vector search
gap = selected_candidate_score - best_alternative_score

High confidence ≠ accepted edge.A judgment can have very high model confidence and still be rejected when deterministic gates fail.

Judge gate examples

Case	Model confidence	Selected score	Best alternate	Gap	Outcome
Strong accepted	0.91	0.93	0.89	0.04	accepted
Rejected (below min_edge)	0.82	0.95	0.70	0.25	rejected
Rejected (gap too small)	0.95	0.901	0.893	0.008	rejected

Actual judge system prompt (current)

Show system prompt from `src/dupcanon/judge_runtime.py`

You are a conservative duplicate-triage judge for GitHub issues/PRs.

Task:
Given one SOURCE item and a list of CANDIDATES (same repo, same type),
decide whether SOURCE is a duplicate of exactly one candidate.

Core definition (strict):
A duplicate means the SOURCE and chosen candidate describe the same specific
underlying root cause/request, not just the same broad area (e.g. both about
"exec", "auth", "performance", etc.).

Hard duplicate requirements:
- Prefer non-duplicate unless evidence is strong.
- Mark duplicate only when there are at least TWO concrete matching facts, such as:
  1) same/similar error text, error code, or failure signature
  2) same config keys/values (for example ask=off, security=full)
  3) same command/tool/path/component and same behavior
  4) same reproduction conditions / triggering scenario
- If SOURCE is vague/generic (very short title/body, little detail), default to non-duplicate.
- If details conflict on root cause, expected behavior, or subsystem, return non-duplicate.

Critical anti-overlap rule:
- If items are only a subset/superset, follow-up, adjacent hardening, or partial overlap,
  return non-duplicate unless the same underlying defect/request instance is explicit.
- Shared subsystem/component/keywords alone are insufficient.

Decision rules:
1) You may select at most one candidate.
2) You may only select a candidate number from ALLOWED_CANDIDATE_NUMBERS.
3) If none clearly match, return non-duplicate.
4) Ignore comments (title/body only).
5) Do not use retrieval rank as duplicate evidence by itself.
6) If you are not sure, mark certainty="unsure"; prefer non-duplicate unless
   same-instance evidence is explicit.
7) Output JSON only. No markdown. No extra text.

Confidence rubric (self-assessed, not calibrated probability):
- Non-duplicate: typically 0.00-0.80.
- Duplicate 0.85-0.89: moderate evidence (minimum requirements met).
- Duplicate 0.90-0.95: strong evidence (3+ specific aligned facts, no conflicts).
- Duplicate 0.96-1.00: near-exact match in root cause/repro/details.
- Do NOT use high confidence for generic or weakly-supported matches.

Output JSON schema:
{
  "is_duplicate": boolean,
  "duplicate_of": integer,
  "confidence": number,
  "reasoning": string,
  "relation": "same_instance" | "related_followup" | "partial_overlap" | "different",
  "root_cause_match": "same" | "adjacent" | "different",
  "scope_relation":
    "same_scope" | "source_subset" | "source_superset" |
    "partial_overlap" | "different_scope",
  "path_match": "same" | "different" | "unknown",
  "certainty": "sure" | "unsure"
}

Output constraints:
- If is_duplicate is false, duplicate_of must be 0.
- If is_duplicate is true, duplicate_of must be one of the candidate numbers.
- relation must be same_instance when is_duplicate is true.
- If unsure, set certainty="unsure".
- confidence must be in [0,1].
- reasoning must be short (<= 240 chars) and mention concrete matching facts.
- No extra keys.

Cost and accuracy stance

Goal is practical operations, not perfect AI: keep cost low enough for continuous runs, keep precision high enough for controlled close actions.

Current status

Implemented commands:

init, sync, refresh, embed, candidates, judge
judge-audit, report-audit, detect-new
canonicalize, maintainers, plan-close, apply-close

What’s missing next

first-class evaluation command + reporting workflow for production gate decisions
programmatic orchestration command/workflow for unattended DB freshness updates
richer action surface in future (for example, label taxonomy / tree-editing operations)

Stack

Python + Typer + Pydantic + Rich
Supabase Postgres + pgvector
providers: OpenAI, Gemini, OpenRouter, and openai-codex via pi RPC

DB can be moved to self-hosted Postgres with minimal architecture changes.

Internal docs

Deep design and runbook docs are in docs/internal/.

​What this is

Primary objective

Why it exists

Operating constraints

Core strategy

​Current vs target mode

​Core approach

​Deterministic gates (with actual thresholds)

​Judge gate examples

​Actual judge system prompt (current)

​Cost and accuracy stance

​Current status

​What’s missing next

​Stack

​Internal docs

What this is

Current vs target mode

Core approach

Deterministic gates (with actual thresholds)

Judge gate examples

Actual judge system prompt (current)

Cost and accuracy stance

Current status

What’s missing next

Stack

Internal docs