This guide is written for both first-time operators and experienced engineers. If you just want a fast success path, use the Beginner track below.
What you’ll do
By the end of this guide, you will:- install and configure
dupcanon - run the full duplicate pipeline on a real repo
- generate a safe close plan in dry-run mode
- understand how to move from manual runs to workflow-driven automation
Beginner track (recommended first run)
Follow the exact commands in order and get a working pipeline in ~15–30 minutes.
Advanced track
Use provider/model overrides, tune worker settings, and integrate with GitHub workflows.
Prerequisites
uvinstalledghCLI installed and authenticated (gh auth status)- Docker Desktop / Docker Engine (for local Supabase)
- Supabase CLI (for migrations and local DB)
- reachable Postgres DSN (Supabase recommended)
- API credentials for the providers you plan to use
Required credentials by default stack
Required credentials by default stack
Default runtime stack:
- embeddings:
openai/text-embedding-3-large - judge:
openai-codexviapi --mode rpc
SUPABASE_DB_URLOPENAI_API_KEY(for embeddings)piCLI on PATH (foropenai-codexjudge path)
GITHUB_TOKEN(if not relying on existingghauth)LOGFIRE_TOKEN(for remote log sink)
Alternative provider requirements
Alternative provider requirements
- If embedding/judge provider is
gemini->GEMINI_API_KEY - If judge provider is
openai->OPENAI_API_KEY - If judge provider is
openrouter->OPENROUTER_API_KEY
1) Set up Supabase database first (local Docker recommended)
- Local Supabase (Docker)
- Hosted Supabase
Install Supabase CLI (macOS example):Start local Supabase services from repo root:Apply and validate schema locally:Use local Postgres DSN in
.env:2) Install and configure dupcanon
.env and set at minimum:
3) First successful pipeline run (Beginner track)
Use a repo you can safely test against.Every command prints a summary table with counters, making it easy to verify each stage before moving on.
4) Safe mutation path (only when ready)
Persist a real plan:5) Online path (entryway for automation)
detect-new is the online single-item entrypoint used for workflow-driven automation.
.github/workflows/detect-new-shadow.yml
6) Advanced track (overrides and tuning)
- Provider/model overrides
- Concurrency tuning
- Audit and gate simulation
Common troubleshooting
Supabase CLI / Docker setup issues
Supabase CLI / Docker setup issues
- Ensure Docker is running before
supabase start. - If local services fail to start, restart Docker and run
supabase stop && supabase start. - Confirm CLI is installed:
supabase --version. - Rebuild local schema state with
supabase db reset.
init fails with DSN message
init fails with DSN message
Use a Postgres DSN (
postgresql://...), not Supabase HTTPS URL. Prefer Supabase IPv4 pooler DSN if direct DB is unreachable.judge says provider key/CLI missing
judge says provider key/CLI missing
Ensure env matches selected provider:
gemini->GEMINI_API_KEYopenai->OPENAI_API_KEYopenrouter->OPENROUTER_API_KEYopenai-codex->piCLI on PATH
No candidates or low yield
No candidates or low yield
Check that:
- sync and embed completed
- candidate retrieval used expected type/state filters
- threshold is not too strict for your corpus
High-confidence duplicate still rejected
High-confidence duplicate still rejected
This is expected when deterministic gates fail (for example: target not open, score gap below
0.015, structural mismatch vetoes).From manual to automated operations
A practical scheduled sequence is:refresh --refresh-knownembed --only-changedcandidates --include openjudgecanonicalizeplan-close(and apply only under review policy)
detect-new from issue/pr opened events.
Next docs to read
- Overview:
/ - Internal runbook/specs:
docs/internal/