fallow: codebase intelligence for TypeScript and JavaScript

Detect copy-pasted code blocks across your entire codebase using suffix-array analysis.

Start with the default mild mode. Use semantic when you want to catch clones with renamed variables.

fallow dupes

Options

Detection

Flag	Description
`--mode <MODE>`	Detection mode: `strict`, `mild` (default), `weak`, `semantic`
`--min-tokens <N>`	Minimum tokens per clone (default: 50)
`--min-lines <N>`	Minimum lines per clone (default: 5)
`--min-occurrences <N>`	Minimum number of occurrences before a clone group is reported (default: 2, must be ≥ 2). Raise to skip pair-only clones and focus on widespread copy-paste worth refactoring.
`--skip-local`	Only report cross-directory duplicates
`--cross-language`	Strip TS types for TS to JS matching
`--ignore-imports`	Exclude import declarations from clone detection

Output

Flag	Description
`-f, --format <FORMAT>`	Output format: `human` (default), `json`, `sarif`, `compact`, `markdown`, `codeclimate`, `gitlab-codequality`, `pr-comment-github`, `pr-comment-gitlab`, `review-github`, `review-gitlab`
`--top <N>`	Show only the N most-duplicated clone groups. Groups are sorted by instance count descending (tiebreak: line count descending, then deterministic path/line). Summary stats reflect the full project, not just the shown groups.
`--quiet`	Suppress progress output
`--threshold <N>`	Fail if duplication exceeds N%
`--group-by <MODE>`	Partition the report into per-group sections. `MODE`: `owner` (CODEOWNERS), `directory` (first path component), `package` (workspace), or `section` (GitLab `[Section]` headers). Each clone group is attributed to its largest owner (most instances; alphabetical tiebreak), so a group split 2 src / 1 lib appears under `src`. JSON adds `grouped_by` plus a `groups` array of per-bucket dedup-aware `stats`, attributed `clone_groups` (each carrying `primary_owner` and per-instance `owner`), and `clone_families`. SARIF results carry `properties.group` and CodeClimate issues carry a top-level `group` field. Compact and markdown fall back to ungrouped output with a stderr note.
`--explain`	Add metric explanations. In human format, explanations are always shown. In JSON format, adds a `_meta` object with metric descriptions and docs links.
`--explain-skipped`	In human/markdown output, show the per-pattern breakdown for files skipped by default duplicates ignores. Machine formats suppress this note.
`--summary`	Print a one-line summary of duplication counts at the end of the run. In JSON format, adds a `summary` counts object.

Incremental

Flag	Description
`--changed-since <REF>`	Only report duplication in files changed since a git ref
`--baseline <PATH>`	Compare against a previously saved baseline file
`--save-baseline <PATH>`	Save current duplication as a baseline file

Token cache

For projects with at least duplicates.minCorpusSizeForTokenCache source files (default 5000), fallow dupes keeps a per-project tokenized-source cache under <root>/.fallow/cache/dupes-tokens-vN/ so warm runs skip re-tokenizing unchanged files. The cache writes a sibling .gitignore containing * on first save, so its contents stay out of git status. Pass --no-cache to disable the cache for one run, or set a higher minCorpusSizeForTokenCache to keep it disabled even on larger projects. Below the threshold the cache stays off because the load/save overhead exceeds the tokenize savings on small corpora. When --changed-since is set on a project at or above duplicates.minCorpusSizeForShingleFilter files (default 1024), the detector additionally drops unchanged files whose k-token shingles do not overlap any changed file before building the suffix array. Both knobs are scoped to [duplicates] in .fallowrc.json / .fallowrc.jsonc / fallow.toml / .fallow.toml.

Default generated-output ignores

fallow dupes skips common framework output by default: .next, .nuxt, .svelte-kit, .turbo, .parcel-cache, .vite, .cache, out, and storybook-static. These defaults merge with duplicates.ignore; they do not replace your configured ignores. If your duplication percentage drops after upgrading, fallow is now excluding generated framework output from duplicate detection by default. Use --explain-skipped in human or markdown output to see the per-pattern skipped-file counts.

{
  "duplicates": {
    "ignoreDefaults": true,
    "ignore": ["**/__generated__/**"]
  }
}

Set ignoreDefaults to false to opt out and use only your duplicates.ignore list.

Monorepo scoping

Both flags retain only clone groups where at least one instance is under a selected workspace root. The full cross-workspace index is still built; reported groups and the duplication_percentage stat are recomputed from the scoped slice.

Flag	Description
`-w, --workspace <PATTERNS>`	Scope output to one or more workspaces. Supports exact package names, globs (`apps/`, `@scope/`) matched against both the package name and the workspace path, and `!`-prefixed negation. Comma-separated values or repeated flag.
`--changed-workspaces <REF>`	Git-derived monorepo CI scoping: scope to workspaces containing any file changed since `REF` (e.g. `origin/main`). Mutually exclusive with `--workspace`. Missing ref is a hard error (exit 2), not silent full-scope fallback.

Debugging

Flag	Description
`--trace <FILE:LINE>`	Show all clones of code at a specific location
`--performance`	Show pipeline timing breakdown

Detection modes

Detection mode details

Each mode progressively normalizes more syntax before comparing, trading precision for recall.

Mode	What it normalizes	Best for
strict	Nothing: exact token match	Finding verbatim copies
mild	Currently equivalent to strict (AST tokenization inherently strips whitespace)	General-purpose detection (default)
weak	+ string literal values	Catching copies with different messages
semantic	+ variable names and numeric values	Finding structural clones after renaming

As you move from strict to semantic, you’ll find more clones but also more potential false positives. Start with mild and increase sensitivity as needed.

Examples

# Mild mode (default)
fallow dupes

Example output

$ fallow dupes --top 3

● Duplicates (3 clone groups)

  5,433 lines  2 instances
    deno/lib/types.ts:45-5477
    src/types.ts:45-5477

    914 lines  2 instances
    deno/lib/__tests__/string.test.ts:8-921
    src/__tests__/string.test.ts:7-920

     42 lines  3 instances
    src/features/forecasting/server/procedures/analytics.ts:141-181
    src/features/forecasting/server/procedures/cashflow.ts:153-194
    src/features/forecasting/server/procedures/income.ts:590-631

  Identical code blocks detected via suffix-array analysis — https://docs.fallow.tools/explanations/duplication#clone-groups

✓ 27,255 lines (19.4%) duplicated across 398 files (0.23s)

$ fallow dupes --threshold 15

● Duplicates (3 clone groups)

  5,433 lines  2 instances
    deno/lib/types.ts:45-5477
    src/types.ts:45-5477

    914 lines  2 instances
    deno/lib/__tests__/string.test.ts:8-921
    src/__tests__/string.test.ts:7-920

     42 lines  3 instances
    src/features/forecasting/server/procedures/analytics.ts:141-181
    src/features/forecasting/server/procedures/cashflow.ts:153-194
    src/features/forecasting/server/procedures/income.ts:590-631

  Identical code blocks detected via suffix-array analysis — https://docs.fallow.tools/explanations/duplication#clone-groups

✗ 27,255 lines (19.4%) duplicated across 398 files (0.23s)
Duplication (19.4%) exceeds threshold (15%)

JSON output

`mirrored_directories`

When fallow detects directory pairs that are largely duplicated (e.g., a deno/ mirror of src/), the JSON output includes a mirrored_directories array:

{
  "mirrored_directories": [
    {
      "dir_a": "deno/lib",
      "dir_b": "src",
      "shared_files": ["client.ts", "handlers.ts", "types.ts", "utils.ts"],
      "total_lines": 6347
    }
  ]
}

Each entry identifies two directories with significant overlap, the shared filenames, and the total duplicated lines across those files.

Duplication analysis

How fallow’s detection engine works.

Configuration

Set default modes and thresholds in your config.

Migrating from jscpd

Coming from jscpd? See the migration guide.

Commands

fallow dupes

Options

Detection

Output

Incremental

Token cache

Default generated-output ignores

Monorepo scoping

Debugging

Detection modes

Examples

Example output

JSON output

`mirrored_directories`

See also

Duplication analysis

Configuration

Migrating from jscpd

Commands

Documentation Index

​Options

​Detection

​Output

​Incremental

​Token cache

​Default generated-output ignores

​Monorepo scoping

​Debugging

​Detection modes

​Examples

​Example output

​JSON output

​mirrored_directories

​See also

Duplication analysis

Configuration

Migrating from jscpd

Options

Detection

Output

Incremental

Token cache

Default generated-output ignores

Monorepo scoping

Debugging

Detection modes

Examples

Example output

JSON output

`mirrored_directories`

See also