Skip to main content
dupcanon is currently human-operated, but it is being shaped for workflow-native automation in GitHub Actions.

What this is

Primary objective

Reduce duplicate noise in high-cadence repos (for example, OpenClaw) by selecting stable canonicals and closing true duplicates safely.

Why it exists

Naive automation creates duplicate chains and unstable targets; this system adds durable state + deterministic gates.

Operating constraints

Cheap enough to run continuously, accurate enough for guarded production use, and fully auditable.

Core strategy

Embeddings for retrieval, LLM for semantic judgment, deterministic policy for acceptance and close safety.

Current vs target mode

  • Human-operated CLI pipeline
  • Online detect-new in shadow/suggest mode
  • Close actions gated through reviewed plan-close + explicit apply-close --yes

Core approach

Two paths share one data model and one safety model.
  1. Online entrypoint: detect-new for newly opened issues/PRs
  2. sync / refresh: ingest issues + PRs into Postgres
  3. embed: title/body embeddings in pgvector
  4. candidates: persisted nearest-neighbor candidate sets
  5. judge: LLM chooses duplicate target inside the candidate set
  6. deterministic gates veto risky decisions
  7. canonicalize: compute canonical representatives
  8. plan-close -> reviewed apply-close --yes
All key artifacts are persisted for replay, audit, and threshold tuning.

Deterministic gates (with actual thresholds)

  • strict JSON parse
  • target must be in candidate set
  • confidence threshold: model_confidence >= 0.85 (min_edge default)
  • target must be open
  • candidate gap gate: selected_score - best_alternative_score >= 0.015
  • mismatch vetoes (uncertain/overlap/root-cause/scope class failures)
  • one accepted outgoing edge per source unless explicit rejudge
  • maintainer author protection
  • maintainer assignee protection
  • direct accepted edge to canonical required
  • close threshold: direct edge confidence >= 0.90 (min_close default)
  • default thresholds: maybe=0.85, duplicate=0.92
  • strict duplicate downgrade if structural guardrails fail
  • duplicate class also requires strong retrieval support (current floor: top match score >= 0.90)
  • candidate gap gate also applies on strict duplicate path (>= 0.015)
Confidence vs score (important):
  • confidence = model’s self-reported duplicate confidence in [0,1]
  • score = retrieval similarity score from vector search
  • gap = selected_candidate_score - best_alternative_score
High confidence ≠ accepted edge.A judgment can have very high model confidence and still be rejected when deterministic gates fail.

Judge gate examples

CaseModel confidenceSelected scoreBest alternateGapOutcome
Strong accepted0.910.930.890.04accepted
Rejected (below min_edge)0.820.950.700.25rejected
Rejected (gap too small)0.950.9010.8930.008rejected

Actual judge system prompt (current)

You are a conservative duplicate-triage judge for GitHub issues/PRs.

Task:
Given one SOURCE item and a list of CANDIDATES (same repo, same type),
decide whether SOURCE is a duplicate of exactly one candidate.

Core definition (strict):
A duplicate means the SOURCE and chosen candidate describe the same specific
underlying root cause/request, not just the same broad area (e.g. both about
"exec", "auth", "performance", etc.).

Hard duplicate requirements:
- Prefer non-duplicate unless evidence is strong.
- Mark duplicate only when there are at least TWO concrete matching facts, such as:
  1) same/similar error text, error code, or failure signature
  2) same config keys/values (for example ask=off, security=full)
  3) same command/tool/path/component and same behavior
  4) same reproduction conditions / triggering scenario
- If SOURCE is vague/generic (very short title/body, little detail), default to non-duplicate.
- If details conflict on root cause, expected behavior, or subsystem, return non-duplicate.

Critical anti-overlap rule:
- If items are only a subset/superset, follow-up, adjacent hardening, or partial overlap,
  return non-duplicate unless the same underlying defect/request instance is explicit.
- Shared subsystem/component/keywords alone are insufficient.

Decision rules:
1) You may select at most one candidate.
2) You may only select a candidate number from ALLOWED_CANDIDATE_NUMBERS.
3) If none clearly match, return non-duplicate.
4) Ignore comments (title/body only).
5) Do not use retrieval rank as duplicate evidence by itself.
6) If you are not sure, mark certainty="unsure"; prefer non-duplicate unless
   same-instance evidence is explicit.
7) Output JSON only. No markdown. No extra text.

Confidence rubric (self-assessed, not calibrated probability):
- Non-duplicate: typically 0.00-0.80.
- Duplicate 0.85-0.89: moderate evidence (minimum requirements met).
- Duplicate 0.90-0.95: strong evidence (3+ specific aligned facts, no conflicts).
- Duplicate 0.96-1.00: near-exact match in root cause/repro/details.
- Do NOT use high confidence for generic or weakly-supported matches.

Output JSON schema:
{
  "is_duplicate": boolean,
  "duplicate_of": integer,
  "confidence": number,
  "reasoning": string,
  "relation": "same_instance" | "related_followup" | "partial_overlap" | "different",
  "root_cause_match": "same" | "adjacent" | "different",
  "scope_relation":
    "same_scope" | "source_subset" | "source_superset" |
    "partial_overlap" | "different_scope",
  "path_match": "same" | "different" | "unknown",
  "certainty": "sure" | "unsure"
}

Output constraints:
- If is_duplicate is false, duplicate_of must be 0.
- If is_duplicate is true, duplicate_of must be one of the candidate numbers.
- relation must be same_instance when is_duplicate is true.
- If unsure, set certainty="unsure".
- confidence must be in [0,1].
- reasoning must be short (<= 240 chars) and mention concrete matching facts.
- No extra keys.

Cost and accuracy stance

Goal is practical operations, not perfect AI: keep cost low enough for continuous runs, keep precision high enough for controlled close actions.

Current status

Implemented commands:
  • init, sync, refresh, embed, candidates, judge
  • judge-audit, report-audit, detect-new
  • canonicalize, maintainers, plan-close, apply-close

What’s missing next

  • first-class evaluation command + reporting workflow for production gate decisions
  • programmatic orchestration command/workflow for unattended DB freshness updates
  • richer action surface in future (for example, label taxonomy / tree-editing operations)

Stack

  • Python + Typer + Pydantic + Rich
  • Supabase Postgres + pgvector
  • providers: OpenAI, Gemini, OpenRouter, and openai-codex via pi RPC
DB can be moved to self-hosted Postgres with minimal architecture changes.

Internal docs

Deep design and runbook docs are in docs/internal/.