6.7. find_dupe — AI judge for detect-dupe clusters
find_dupe (directory: utils/find-dupe/) consumes a
detect-dupe JSON report and asks Claude (via
the das-claude daspkg, module anthropic/anthropic) to
partition each cluster into real-duplicate groups vs false
positives, with a one-line reason. Output is JSON (machine, for CI
gates / future tooling) + Markdown (human, for review).
The tool is an install-then-run script, not a built-into-binary
utility — the anthropic/anthropic dependency is fetched at
runtime via daspkg, so it can’t be linked into
daslang.exe’s all_utils_exe bundle at build time.
Warning
find_dupe sends source code to Anthropic’s API. Each
cluster’s full function bodies (verbatim, with file paths and line
numbers) are uploaded as part of every judge call. Do not run this
tool on proprietary, confidential, or otherwise restricted code
unless your project’s data-handling policy permits sending that
source to Anthropic. See Anthropic’s terms. Use --dry-run to preview
cluster size and token estimates without making any API calls.
6.7.1. Install
From the project root:
bin/daslang utils/daspkg/main.das -- install --root utils/find-dupe
This fetches the das-claude package per utils/find-dupe/.das_package
and unpacks it under utils/find-dupe/modules/. If you skip this
step, bin/daslang utils/find-dupe/main.das fails at compile time
with module 'anthropic/anthropic' not found — re-run the install
command above. See daspkg — Package Manager for daspkg semantics.
6.7.2. API key
Set ANTHROPIC_API_KEY so daslang’s non-interactive subshells can
see it. On macOS the easiest path is ~/.zshenv (sourced by every
zsh invocation, interactive or not):
echo 'export ANTHROPIC_API_KEY="sk-ant-..."' >> ~/.zshenv
~/.zshrc works for terminals you opened yourself but not for
GUI-launched processes (VS Code, Claude Code’s MCP server) — those
inherit env from launchctl. ~/.zshenv covers both.
6.7.3. Smoke test
Verify the wiring before running over real clusters:
bin/daslang utils/find-dupe/main.das -- --smoke-test
Expected output:
Running das-claude smoke test (model=claude-haiku-4-5-20251001)...
Reply: OK
Tokens: in=15 out=4
Smoke test PASSED.
6.7.4. Workflow
Run detect-dupe to produce a JSON report:
bin/daslang utils/detect-dupe/main.das -- -p <paths> --json ./dupes.json
Dry-run
find_dupefor a cost preview (no API calls):bin/daslang utils/find-dupe/main.das -- --input ./dupes.json --dry-run -v
Output: cluster counts, estimated input/output tokens, dollar estimate.
Live run to produce verdicts:
bin/daslang utils/find-dupe/main.das -- --input ./dupes.json -v
Writes
find_dupe_verdicts.json(machine) andfind_dupe_verdicts.md(human) to./find-dupe-out/(override the directory with--out, the basename with--out-basename).
Run from the project root. detect-dupe records source paths
relative to its cwd; find_dupe extracts each function’s body from
disk using those paths, so its cwd must match.
6.7.5. Flags
Flag |
Default |
Meaning |
|---|---|---|
|
required |
detect-dupe JSON report file |
|
|
output directory |
|
|
|
|
|
skip clusters where every member is shorter than this |
|
|
hard cap on clusters analyzed (0 = no cap) |
|
|
|
|
|
concurrent judge calls (1 = sequential) |
|
off |
filter reports to actionable rows only: real + partial |
|
off |
estimate tokens & cost without making API calls |
|
off |
one-shot API call to verify the env wiring |
|
off |
per-cluster progress to stdout |
|
print help and exit |
The default sort puts fuzzy pairs first (highest AI value), then
larger clusters, then larger total span — useful with
--max-clusters and when interrupting a long run.
6.7.6. Output schema
find_dupe_verdicts.json — top-level fields:
Field |
Type |
Description |
|---|---|---|
|
int |
bumps on schema changes |
|
string |
resolved model id |
|
object |
totals + token usage |
|
array |
one |
Each VerdictRow carries cluster_id, kind
(exact/fuzzy), similarity (for fuzzy), members (with
file/line/end_line), verdict
(real/partial/false_positive/skipped/error),
groups (the partition), false_positives (indices), reason,
and per-call token usage.
find_dupe_verdicts.md — human-readable, per-cluster sections
with verdict tag, reason, member list, partition, and the original
source (collapsed under <details>).
6.7.7. MCP integration
The MCP Server — AI Tool Integration server exposes two tools that wrap this CLI.
Both shell out to daslang utils/find-dupe/main.das — the
anthropic/anthropic dependency is fetched by daspkg at runtime, so
in-process require would force every MCP user to install daspkg
packages they may never use. When the subprocess fails because the
package isn’t installed, the tool surfaces a structured “anthropic
daspkg not installed” error with the exact install command.
Tool |
Purpose |
|---|---|
|
Take a detect-dupe JSON report and return the verdict envelope.
Parameters: |
|
Convenience: run detect-dupe against |
The MCP layer hardcodes parallel=8 and min_lines=6 (sensible
defaults for an interactive tool); use the CLI directly if you need
to tune those.
Both tools accept dry_run=true to estimate cluster count and token
cost without making any API calls — the safe default for “is this
worth running?” probes.
6.7.8. Pricing
Approximate, as of 2026-04:
Model |
Input ($/MTok) |
Output ($/MTok) |
|---|---|---|
Haiku 4.5 |
1.00 |
5.00 |
Sonnet 4.6 |
3.00 |
15.00 |
A typical cluster (3-4 functions, 50 lines total) consumes ~1 KTok in
+ 0.3 KTok out → ~$0.0025 on Haiku. Use --dry-run first if a run
looks large.
6.7.9. Implementation
File |
Role |
|---|---|
|
CLI ( |
|
detect-dupe JSON shadow structs +
|
|
|
|
|
|
|
|
|
|
dastest suite (hermetic; no API calls) |
6.7.10. See also
detect-dupe — Cross-file similar-function detector — the cluster-producing pipeline whose JSON
find_dupeconsumes.MCP Server — AI Tool Integration — the MCP server that wraps both tools.
daspkg — Package Manager — the package manager that fetches
das-claude.