> ## Documentation Index
> Fetch the complete documentation index at: https://docs.fallow.tools/llms.txt
> Use this file to discover all available pages before exploring further.

# fallow dupes

> CLI reference for fallow dupes. Detect duplicated code blocks across your codebase using suffix-array analysis with configurable thresholds.

Detect copy-pasted code blocks across your entire codebase using suffix-array analysis.

<Tip>
  Start with the default `mild` mode. Use `semantic` when you want to catch clones with renamed variables.
</Tip>

```bash theme={null}
fallow dupes
```

## Options

### Detection

| Flag                    | Description                                                                                                                                                                    |
| :---------------------- | :----------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `--mode <MODE>`         | Detection mode: `strict`, `mild` (default), `weak`, `semantic`                                                                                                                 |
| `--min-tokens <N>`      | Minimum tokens per clone (default: 50)                                                                                                                                         |
| `--min-lines <N>`       | Minimum lines per clone (default: 5)                                                                                                                                           |
| `--min-occurrences <N>` | Minimum number of occurrences before a clone group is reported (default: 2, must be ≥ 2). Raise to skip pair-only clones and focus on widespread copy-paste worth refactoring. |
| `--skip-local`          | Only report cross-directory duplicates                                                                                                                                         |
| `--cross-language`      | Strip TS types for TS to JS matching                                                                                                                                           |
| `--ignore-imports`      | Exclude import declarations from clone detection                                                                                                                               |

### Output

| Flag                    | Description                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        |
| :---------------------- | :------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `-f, --format <FORMAT>` | Output format: `human` (default), `json`, `sarif`, `compact`, `markdown`, `codeclimate`, `gitlab-codequality`, `pr-comment-github`, `pr-comment-gitlab`, `review-github`, `review-gitlab`                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          |
| `--top <N>`             | Show only the N most-duplicated clone groups. Groups are sorted by instance count descending (tiebreak: line count descending, then deterministic path/line). Summary stats reflect the full project, not just the shown groups.                                                                                                                                                                                                                                                                                                                                                                                                                                                                   |
| `--quiet`               | Suppress progress output                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           |
| `--threshold <N>`       | Fail if duplication exceeds N%                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     |
| `--group-by <MODE>`     | Partition the report into per-group sections. `MODE`: `owner` (CODEOWNERS), `directory` (first path component), `package` (workspace), or `section` (GitLab `[Section]` headers). Each clone group is attributed to its largest owner (most instances; alphabetical tiebreak), so a group split 2 src / 1 lib appears under `src`. JSON adds `grouped_by` plus a `groups` array of per-bucket dedup-aware `stats`, attributed `clone_groups` (each carrying `primary_owner` and per-instance `owner`), and `clone_families`. SARIF results carry `properties.group` and CodeClimate issues carry a top-level `group` field. Compact and markdown fall back to ungrouped output with a stderr note. |
| `--explain`             | Add metric explanations. In human format, explanations are always shown. In JSON format, adds a `_meta` object with metric descriptions and docs links.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            |
| `--explain-skipped`     | In human/markdown output, show the per-pattern breakdown for files skipped by default duplicates ignores. Machine formats suppress this note.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      |
| `--summary`             | Print a one-line summary of duplication counts at the end of the run. In JSON format, adds a `summary` counts object.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              |

### Incremental

| Flag                     | Description                                              |
| :----------------------- | :------------------------------------------------------- |
| `--changed-since <REF>`  | Only report duplication in files changed since a git ref |
| `--baseline <PATH>`      | Compare against a previously saved baseline file         |
| `--save-baseline <PATH>` | Save current duplication as a baseline file              |

#### Token cache

For projects with at least `duplicates.minCorpusSizeForTokenCache` source files (default `5000`), `fallow dupes` keeps a per-project tokenized-source cache under `<root>/.fallow/cache/dupes-tokens-vN/` so warm runs skip re-tokenizing unchanged files. The cache writes a sibling `.gitignore` containing `*` on first save, so its contents stay out of `git status`. Pass `--no-cache` to disable the cache for one run, or set a higher `minCorpusSizeForTokenCache` to keep it disabled even on larger projects. Below the threshold the cache stays off because the load/save overhead exceeds the tokenize savings on small corpora.

When `--changed-since` is set on a project at or above `duplicates.minCorpusSizeForShingleFilter` files (default `1024`), the detector additionally drops unchanged files whose k-token shingles do not overlap any changed file before building the suffix array. Both knobs are scoped to `[duplicates]` in `.fallowrc.json` / `.fallowrc.jsonc` / `fallow.toml` / `.fallow.toml`.

#### Default generated-output ignores

`fallow dupes` skips common framework output by default: `.next`, `.nuxt`, `.svelte-kit`, `.turbo`, `.parcel-cache`, `.vite`, `.cache`, `out`, and `storybook-static`. These defaults merge with `duplicates.ignore`; they do not replace your configured ignores.

If your duplication percentage drops after upgrading, fallow is now excluding generated framework output from duplicate detection by default. Use `--explain-skipped` in human or markdown output to see the per-pattern skipped-file counts.

```jsonc theme={null}
{
  "duplicates": {
    "ignoreDefaults": true,
    "ignore": ["**/__generated__/**"]
  }
}
```

Set `ignoreDefaults` to `false` to opt out and use only your `duplicates.ignore` list.

### Monorepo scoping

Both flags retain only clone groups where at least one instance is under a selected workspace root. The full cross-workspace index is still built; reported groups and the `duplication_percentage` stat are recomputed from the scoped slice.

| Flag                         | Description                                                                                                                                                                                                                          |
| :--------------------------- | :----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `-w, --workspace <PATTERNS>` | Scope output to one or more workspaces. Supports exact package names, globs (`apps/*`, `@scope/*`) matched against both the package name and the workspace path, and `!`-prefixed negation. Comma-separated values or repeated flag. |
| `--changed-workspaces <REF>` | Git-derived monorepo CI scoping: scope to workspaces containing any file changed since `REF` (e.g. `origin/main`). Mutually exclusive with `--workspace`. Missing ref is a hard error (exit 2), not silent full-scope fallback.      |

### Debugging

| Flag                  | Description                                    |
| :-------------------- | :--------------------------------------------- |
| `--trace <FILE:LINE>` | Show all clones of code at a specific location |
| `--performance`       | Show pipeline timing breakdown                 |

## Detection modes

<Accordion title="Detection mode details">
  Each mode progressively normalizes more syntax before comparing, trading precision for recall.

  | Mode         | What it normalizes                                                             | Best for                                 |
  | :----------- | :----------------------------------------------------------------------------- | :--------------------------------------- |
  | **strict**   | Nothing: exact token match                                                     | Finding verbatim copies                  |
  | **mild**     | Currently equivalent to strict (AST tokenization inherently strips whitespace) | General-purpose detection (default)      |
  | **weak**     | + string literal values                                                        | Catching copies with different messages  |
  | **semantic** | + variable names and numeric values                                            | Finding structural clones after renaming |

  As you move from `strict` to `semantic`, you'll find more clones but also more potential false positives. Start with `mild` and increase sensitivity as needed.
</Accordion>

## Examples

<CodeGroup>
  ```bash Default analysis theme={null}
  # Mild mode (default)
  fallow dupes
  ```

  ```bash Filtering theme={null}
  # Show only the 10 largest clone groups
  fallow dupes --top 10

  # Top 5 semantic clones, cross-directory only
  fallow dupes --top 5 --mode semantic --skip-local
  ```

  ```bash Advanced detection theme={null}
  # Catch renamed-variable clones
  fallow dupes --mode semantic

  # Only cross-directory duplicates
  fallow dupes --skip-local

  # TS to JS comparison
  fallow dupes --cross-language

  # Ignore sorted import blocks
  fallow dupes --ignore-imports
  ```

  ```bash CI integration theme={null}
  # Only check duplication in changed files
  fallow dupes --changed-since main

  # Fail CI if duplication exceeds 5%
  fallow dupes --threshold 5

  # SARIF output for GitHub
  fallow dupes --format sarif > dupes.sarif
  ```

  ```bash Debugging theme={null}
  # Trace clones at a specific line
  fallow dupes --trace src/utils.ts:42

  # Performance breakdown
  fallow dupes --performance
  ```
</CodeGroup>

## Example output

```bash title="$ fallow dupes --top 3" theme={null}
● Duplicates (3 clone groups)

  5,433 lines  2 instances
    deno/lib/types.ts:45-5477
    src/types.ts:45-5477

    914 lines  2 instances
    deno/lib/__tests__/string.test.ts:8-921
    src/__tests__/string.test.ts:7-920

     42 lines  3 instances
    src/features/forecasting/server/procedures/analytics.ts:141-181
    src/features/forecasting/server/procedures/cashflow.ts:153-194
    src/features/forecasting/server/procedures/income.ts:590-631

  Identical code blocks detected via suffix-array analysis — https://docs.fallow.tools/explanations/duplication#clone-groups

✓ 27,255 lines (19.4%) duplicated across 398 files (0.23s)
```

```bash title="$ fallow dupes --threshold 15" theme={null}
● Duplicates (3 clone groups)

  5,433 lines  2 instances
    deno/lib/types.ts:45-5477
    src/types.ts:45-5477

    914 lines  2 instances
    deno/lib/__tests__/string.test.ts:8-921
    src/__tests__/string.test.ts:7-920

     42 lines  3 instances
    src/features/forecasting/server/procedures/analytics.ts:141-181
    src/features/forecasting/server/procedures/cashflow.ts:153-194
    src/features/forecasting/server/procedures/income.ts:590-631

  Identical code blocks detected via suffix-array analysis — https://docs.fallow.tools/explanations/duplication#clone-groups

✗ 27,255 lines (19.4%) duplicated across 398 files (0.23s)
Duplication (19.4%) exceeds threshold (15%)
```

## JSON output

### `mirrored_directories`

When fallow detects directory pairs that are largely duplicated (e.g., a `deno/` mirror of `src/`), the JSON output includes a `mirrored_directories` array:

```json theme={null}
{
  "mirrored_directories": [
    {
      "dir_a": "deno/lib",
      "dir_b": "src",
      "shared_files": ["client.ts", "handlers.ts", "types.ts", "utils.ts"],
      "total_lines": 6347
    }
  ]
}
```

Each entry identifies two directories with significant overlap, the shared filenames, and the total duplicated lines across those files.

## See also

<CardGroup cols={3}>
  <Card title="Duplication analysis" icon="clone" href="/analysis/duplication">
    How fallow's detection engine works.
  </Card>

  <Card title="Configuration" icon="gear" href="/configuration/overview">
    Set default modes and thresholds in your config.
  </Card>

  <Card title="Migrating from jscpd" icon="right-left" href="/migration/from-jscpd">
    Coming from jscpd? See the migration guide.
  </Card>
</CardGroup>
