Detect copy-pasted code blocks across your entire codebase using suffix-array analysis.Documentation Index
Fetch the complete documentation index at: https://docs.fallow.tools/llms.txt
Use this file to discover all available pages before exploring further.
Options
Detection
| Flag | Description |
|---|---|
--mode <MODE> | Detection mode: strict, mild (default), weak, semantic |
--min-tokens <N> | Minimum tokens per clone (default: 50) |
--min-lines <N> | Minimum lines per clone (default: 5) |
--min-occurrences <N> | Minimum number of occurrences before a clone group is reported (default: 2, must be ≥ 2). Raise to skip pair-only clones and focus on widespread copy-paste worth refactoring. |
--skip-local | Only report cross-directory duplicates |
--cross-language | Strip TS types for TS to JS matching |
--ignore-imports | Exclude import declarations from clone detection |
Output
| Flag | Description |
|---|---|
-f, --format <FORMAT> | Output format: human (default), json, sarif, compact, markdown, codeclimate, gitlab-codequality, pr-comment-github, pr-comment-gitlab, review-github, review-gitlab |
--top <N> | Show only the N most-duplicated clone groups. Groups are sorted by instance count descending (tiebreak: line count descending, then deterministic path/line). Summary stats reflect the full project, not just the shown groups. |
--quiet | Suppress progress output |
--threshold <N> | Fail if duplication exceeds N% |
--group-by <MODE> | Partition the report into per-group sections. MODE: owner (CODEOWNERS), directory (first path component), package (workspace), or section (GitLab [Section] headers). Each clone group is attributed to its largest owner (most instances; alphabetical tiebreak), so a group split 2 src / 1 lib appears under src. JSON adds grouped_by plus a groups array of per-bucket dedup-aware stats, attributed clone_groups (each carrying primary_owner and per-instance owner), and clone_families. SARIF results carry properties.group and CodeClimate issues carry a top-level group field. Compact and markdown fall back to ungrouped output with a stderr note. |
--explain | Add metric explanations. In human format, explanations are always shown. In JSON format, adds a _meta object with metric descriptions and docs links. |
--explain-skipped | In human/markdown output, show the per-pattern breakdown for files skipped by default duplicates ignores. Machine formats suppress this note. |
--summary | Print a one-line summary of duplication counts at the end of the run. In JSON format, adds a summary counts object. |
Incremental
| Flag | Description |
|---|---|
--changed-since <REF> | Only report duplication in files changed since a git ref |
--baseline <PATH> | Compare against a previously saved baseline file |
--save-baseline <PATH> | Save current duplication as a baseline file |
Token cache
For projects with at leastduplicates.minCorpusSizeForTokenCache source files (default 5000), fallow dupes keeps a per-project tokenized-source cache under <root>/.fallow/cache/dupes-tokens-vN/ so warm runs skip re-tokenizing unchanged files. The cache writes a sibling .gitignore containing * on first save, so its contents stay out of git status. Pass --no-cache to disable the cache for one run, or set a higher minCorpusSizeForTokenCache to keep it disabled even on larger projects. Below the threshold the cache stays off because the load/save overhead exceeds the tokenize savings on small corpora.
When --changed-since is set on a project at or above duplicates.minCorpusSizeForShingleFilter files (default 1024), the detector additionally drops unchanged files whose k-token shingles do not overlap any changed file before building the suffix array. Both knobs are scoped to [duplicates] in .fallowrc.json / .fallowrc.jsonc / fallow.toml / .fallow.toml.
Default generated-output ignores
fallow dupes skips common framework output by default: .next, .nuxt, .svelte-kit, .turbo, .parcel-cache, .vite, .cache, out, and storybook-static. These defaults merge with duplicates.ignore; they do not replace your configured ignores.
If your duplication percentage drops after upgrading, fallow is now excluding generated framework output from duplicate detection by default. Use --explain-skipped in human or markdown output to see the per-pattern skipped-file counts.
ignoreDefaults to false to opt out and use only your duplicates.ignore list.
Monorepo scoping
Both flags retain only clone groups where at least one instance is under a selected workspace root. The full cross-workspace index is still built; reported groups and theduplication_percentage stat are recomputed from the scoped slice.
| Flag | Description |
|---|---|
-w, --workspace <PATTERNS> | Scope output to one or more workspaces. Supports exact package names, globs (apps/*, @scope/*) matched against both the package name and the workspace path, and !-prefixed negation. Comma-separated values or repeated flag. |
--changed-workspaces <REF> | Git-derived monorepo CI scoping: scope to workspaces containing any file changed since REF (e.g. origin/main). Mutually exclusive with --workspace. Missing ref is a hard error (exit 2), not silent full-scope fallback. |
Debugging
| Flag | Description |
|---|---|
--trace <FILE:LINE> | Show all clones of code at a specific location |
--performance | Show pipeline timing breakdown |
Detection modes
Detection mode details
Detection mode details
Each mode progressively normalizes more syntax before comparing, trading precision for recall.
As you move from
| Mode | What it normalizes | Best for |
|---|---|---|
| strict | Nothing: exact token match | Finding verbatim copies |
| mild | Currently equivalent to strict (AST tokenization inherently strips whitespace) | General-purpose detection (default) |
| weak | + string literal values | Catching copies with different messages |
| semantic | + variable names and numeric values | Finding structural clones after renaming |
strict to semantic, you’ll find more clones but also more potential false positives. Start with mild and increase sensitivity as needed.Examples
Example output
$ fallow dupes --top 3
$ fallow dupes --threshold 15
JSON output
mirrored_directories
When fallow detects directory pairs that are largely duplicated (e.g., a deno/ mirror of src/), the JSON output includes a mirrored_directories array:
See also
Duplication analysis
How fallow’s detection engine works.
Configuration
Set default modes and thresholds in your config.
Migrating from jscpd
Coming from jscpd? See the migration guide.