Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.fallow.tools/llms.txt

Use this file to discover all available pages before exploring further.

Detect copy-pasted code blocks across your entire codebase using suffix-array analysis.
Start with the default mild mode. Use semantic when you want to catch clones with renamed variables.
fallow dupes

Options

Detection

FlagDescription
--mode <MODE>Detection mode: strict, mild (default), weak, semantic
--min-tokens <N>Minimum tokens per clone (default: 50)
--min-lines <N>Minimum lines per clone (default: 5)
--min-occurrences <N>Minimum number of occurrences before a clone group is reported (default: 2, must be ≥ 2). Raise to skip pair-only clones and focus on widespread copy-paste worth refactoring.
--skip-localOnly report cross-directory duplicates
--cross-languageStrip TS types for TS to JS matching
--ignore-importsExclude import declarations from clone detection

Output

FlagDescription
-f, --format <FORMAT>Output format: human (default), json, sarif, compact, markdown, codeclimate, gitlab-codequality, pr-comment-github, pr-comment-gitlab, review-github, review-gitlab
--top <N>Show only the N most-duplicated clone groups. Groups are sorted by instance count descending (tiebreak: line count descending, then deterministic path/line). Summary stats reflect the full project, not just the shown groups.
--quietSuppress progress output
--threshold <N>Fail if duplication exceeds N%
--group-by <MODE>Partition the report into per-group sections. MODE: owner (CODEOWNERS), directory (first path component), package (workspace), or section (GitLab [Section] headers). Each clone group is attributed to its largest owner (most instances; alphabetical tiebreak), so a group split 2 src / 1 lib appears under src. JSON adds grouped_by plus a groups array of per-bucket dedup-aware stats, attributed clone_groups (each carrying primary_owner and per-instance owner), and clone_families. SARIF results carry properties.group and CodeClimate issues carry a top-level group field. Compact and markdown fall back to ungrouped output with a stderr note.
--explainAdd metric explanations. In human format, explanations are always shown. In JSON format, adds a _meta object with metric descriptions and docs links.
--explain-skippedIn human/markdown output, show the per-pattern breakdown for files skipped by default duplicates ignores. Machine formats suppress this note.
--summaryPrint a one-line summary of duplication counts at the end of the run. In JSON format, adds a summary counts object.

Incremental

FlagDescription
--changed-since <REF>Only report duplication in files changed since a git ref
--baseline <PATH>Compare against a previously saved baseline file
--save-baseline <PATH>Save current duplication as a baseline file

Token cache

For projects with at least duplicates.minCorpusSizeForTokenCache source files (default 5000), fallow dupes keeps a per-project tokenized-source cache under <root>/.fallow/cache/dupes-tokens-vN/ so warm runs skip re-tokenizing unchanged files. The cache writes a sibling .gitignore containing * on first save, so its contents stay out of git status. Pass --no-cache to disable the cache for one run, or set a higher minCorpusSizeForTokenCache to keep it disabled even on larger projects. Below the threshold the cache stays off because the load/save overhead exceeds the tokenize savings on small corpora. When --changed-since is set on a project at or above duplicates.minCorpusSizeForShingleFilter files (default 1024), the detector additionally drops unchanged files whose k-token shingles do not overlap any changed file before building the suffix array. Both knobs are scoped to [duplicates] in .fallowrc.json / .fallowrc.jsonc / fallow.toml / .fallow.toml.

Default generated-output ignores

fallow dupes skips common framework output by default: .next, .nuxt, .svelte-kit, .turbo, .parcel-cache, .vite, .cache, out, and storybook-static. These defaults merge with duplicates.ignore; they do not replace your configured ignores. If your duplication percentage drops after upgrading, fallow is now excluding generated framework output from duplicate detection by default. Use --explain-skipped in human or markdown output to see the per-pattern skipped-file counts.
{
  "duplicates": {
    "ignoreDefaults": true,
    "ignore": ["**/__generated__/**"]
  }
}
Set ignoreDefaults to false to opt out and use only your duplicates.ignore list.

Monorepo scoping

Both flags retain only clone groups where at least one instance is under a selected workspace root. The full cross-workspace index is still built; reported groups and the duplication_percentage stat are recomputed from the scoped slice.
FlagDescription
-w, --workspace <PATTERNS>Scope output to one or more workspaces. Supports exact package names, globs (apps/*, @scope/*) matched against both the package name and the workspace path, and !-prefixed negation. Comma-separated values or repeated flag.
--changed-workspaces <REF>Git-derived monorepo CI scoping: scope to workspaces containing any file changed since REF (e.g. origin/main). Mutually exclusive with --workspace. Missing ref is a hard error (exit 2), not silent full-scope fallback.

Debugging

FlagDescription
--trace <FILE:LINE>Show all clones of code at a specific location
--performanceShow pipeline timing breakdown

Detection modes

Each mode progressively normalizes more syntax before comparing, trading precision for recall.
ModeWhat it normalizesBest for
strictNothing: exact token matchFinding verbatim copies
mildCurrently equivalent to strict (AST tokenization inherently strips whitespace)General-purpose detection (default)
weak+ string literal valuesCatching copies with different messages
semantic+ variable names and numeric valuesFinding structural clones after renaming
As you move from strict to semantic, you’ll find more clones but also more potential false positives. Start with mild and increase sensitivity as needed.

Examples

# Mild mode (default)
fallow dupes

Example output

$ fallow dupes --top 3
 Duplicates (3 clone groups)

  5,433 lines  2 instances
    deno/lib/types.ts:45-5477
    src/types.ts:45-5477

    914 lines  2 instances
    deno/lib/__tests__/string.test.ts:8-921
    src/__tests__/string.test.ts:7-920

     42 lines  3 instances
    src/features/forecasting/server/procedures/analytics.ts:141-181
    src/features/forecasting/server/procedures/cashflow.ts:153-194
    src/features/forecasting/server/procedures/income.ts:590-631

  Identical code blocks detected via suffix-array analysis https://docs.fallow.tools/explanations/duplication#clone-groups

 27,255 lines (19.4%) duplicated across 398 files (0.23s)
$ fallow dupes --threshold 15
 Duplicates (3 clone groups)

  5,433 lines  2 instances
    deno/lib/types.ts:45-5477
    src/types.ts:45-5477

    914 lines  2 instances
    deno/lib/__tests__/string.test.ts:8-921
    src/__tests__/string.test.ts:7-920

     42 lines  3 instances
    src/features/forecasting/server/procedures/analytics.ts:141-181
    src/features/forecasting/server/procedures/cashflow.ts:153-194
    src/features/forecasting/server/procedures/income.ts:590-631

  Identical code blocks detected via suffix-array analysis https://docs.fallow.tools/explanations/duplication#clone-groups

 27,255 lines (19.4%) duplicated across 398 files (0.23s)
Duplication (19.4%) exceeds threshold (15%)

JSON output

mirrored_directories

When fallow detects directory pairs that are largely duplicated (e.g., a deno/ mirror of src/), the JSON output includes a mirrored_directories array:
{
  "mirrored_directories": [
    {
      "dir_a": "deno/lib",
      "dir_b": "src",
      "shared_files": ["client.ts", "handlers.ts", "types.ts", "utils.ts"],
      "total_lines": 6347
    }
  ]
}
Each entry identifies two directories with significant overlap, the shared filenames, and the total duplicated lines across those files.

See also

Duplication analysis

How fallow’s detection engine works.

Configuration

Set default modes and thresholds in your config.

Migrating from jscpd

Coming from jscpd? See the migration guide.