Skip to main content
This guide is written for both first-time operators and experienced engineers. If you just want a fast success path, use the Beginner track below.

What you’ll do

By the end of this guide, you will:
  1. install and configure dupcanon
  2. run the full duplicate pipeline on a real repo
  3. generate a safe close plan in dry-run mode
  4. understand how to move from manual runs to workflow-driven automation

Beginner track (recommended first run)

Follow the exact commands in order and get a working pipeline in ~15–30 minutes.

Advanced track

Use provider/model overrides, tune worker settings, and integrate with GitHub workflows.

Prerequisites

  • uv installed
  • gh CLI installed and authenticated (gh auth status)
  • Docker Desktop / Docker Engine (for local Supabase)
  • Supabase CLI (for migrations and local DB)
  • reachable Postgres DSN (Supabase recommended)
  • API credentials for the providers you plan to use
Default runtime stack:
  • embeddings: openai / text-embedding-3-large
  • judge: openai-codex via pi --mode rpc
Minimum env requirements for that default:
  • SUPABASE_DB_URL
  • OPENAI_API_KEY (for embeddings)
  • pi CLI on PATH (for openai-codex judge path)
Optional but commonly used:
  • GITHUB_TOKEN (if not relying on existing gh auth)
  • LOGFIRE_TOKEN (for remote log sink)
  • If embedding/judge provider is gemini -> GEMINI_API_KEY
  • If judge provider is openai -> OPENAI_API_KEY
  • If judge provider is openrouter -> OPENROUTER_API_KEY
Install Supabase CLI (macOS example):
brew install supabase/tap/supabase
Start local Supabase services from repo root:
supabase start
Apply and validate schema locally:
supabase db reset
supabase db lint
Use local Postgres DSN in .env:
SUPABASE_DB_URL=postgresql://postgres:[email protected]:54322/postgres
Do not use the Supabase project HTTPS URL as SUPABASE_DB_URL; use a Postgres DSN (postgresql://... or postgres://...).

2) Install and configure dupcanon

uv sync
cp .env.example .env
Edit .env and set at minimum:
SUPABASE_DB_URL=postgresql://...
OPENAI_API_KEY=...
SUPABASE_DB_URL must be a Postgres DSN (postgresql:// or postgres://), not a Supabase HTTPS project URL.
Validate setup:
uv run dupcanon init
You should see runtime checks with ✅/⚠️ plus DSN guidance.

3) First successful pipeline run (Beginner track)

Use a repo you can safely test against.
# 1) ingest recent items
uv run dupcanon sync --repo openclaw/openclaw --since 3d

# 2) embed changed/new items
uv run dupcanon embed --repo openclaw/openclaw --type issue --only-changed

# 3) build candidate sets (open-only by default for operations)
uv run dupcanon candidates --repo openclaw/openclaw --type issue --include open

# 4) run duplicate judge
uv run dupcanon judge --repo openclaw/openclaw --type issue --thinking low

# 5) compute canonical stats (optional but recommended)
uv run dupcanon canonicalize --repo openclaw/openclaw --type issue

# 6) generate close plan safely (dry run)
uv run dupcanon plan-close --repo openclaw/openclaw --type issue --dry-run
Every command prints a summary table with counters, making it easy to verify each stage before moving on.

4) Safe mutation path (only when ready)

Persist a real plan:
uv run dupcanon plan-close --repo openclaw/openclaw --type issue
Then review plan rows in DB and apply explicitly:
uv run dupcanon apply-close --close-run <plan_run_id> --yes
apply-close is intentionally gated: it requires an existing mode=plan run and explicit —yes.

5) Online path (entryway for automation)

detect-new is the online single-item entrypoint used for workflow-driven automation.
uv run dupcanon detect-new --repo openclaw/openclaw --type issue --number 123 --thinking low
Workflow-friendly output:
uv run dupcanon detect-new \
  --repo openclaw/openclaw \
  --type issue \
  --number 123 \
  --json-out .local/artifacts/detect-new.json
Current GitHub Actions shadow workflow:
  • .github/workflows/detect-new-shadow.yml

6) Advanced track (overrides and tuning)

# OpenAI judge
uv run dupcanon judge --repo openclaw/openclaw --type issue --provider openai --model gpt-5-mini

# OpenRouter judge
uv run dupcanon judge --repo openclaw/openclaw --type issue --provider openrouter --model minimax/minimax-m2.5

# Embedding override
uv run dupcanon embed --repo openclaw/openclaw --type issue --only-changed --provider openai --model text-embedding-3-large

Common troubleshooting

  • Ensure Docker is running before supabase start.
  • If local services fail to start, restart Docker and run supabase stop && supabase start.
  • Confirm CLI is installed: supabase --version.
  • Rebuild local schema state with supabase db reset.
Use a Postgres DSN (postgresql://...), not Supabase HTTPS URL. Prefer Supabase IPv4 pooler DSN if direct DB is unreachable.
Ensure env matches selected provider:
  • gemini -> GEMINI_API_KEY
  • openai -> OPENAI_API_KEY
  • openrouter -> OPENROUTER_API_KEY
  • openai-codex -> pi CLI on PATH
Check that:
  • sync and embed completed
  • candidate retrieval used expected type/state filters
  • threshold is not too strict for your corpus
This is expected when deterministic gates fail (for example: target not open, score gap below 0.015, structural mismatch vetoes).

From manual to automated operations

A practical scheduled sequence is:
  1. refresh --refresh-known
  2. embed --only-changed
  3. candidates --include open
  4. judge
  5. canonicalize
  6. plan-close (and apply only under review policy)
For event-driven online checks, trigger detect-new from issue/pr opened events.

Next docs to read

  • Overview: /
  • Internal runbook/specs: docs/internal/