fallow dupes reports, how to read the output, and when to act.
Clone types
Fallow classifies clones using the standard taxonomy from clone detection research.| Clone type | What matches | Detection mode |
|---|---|---|
| Type-1 | Exact token sequences (whitespace/comments already stripped) | strict, mild |
| Type-2 | Same structure, but identifiers and/or literals differ | weak, semantic |
strict to semantic, recall increases (more clones found) but precision decreases (more potential false positives).
Research: clone taxonomy
Research: clone taxonomy
The Type-1 through Type-4 clone taxonomy was surveyed by Roy, Cordy, and Koschke (2009). Fallow detects Type-1 and Type-2 clones. Type-3 (gapped clones with inserted/deleted statements) and Type-4 (semantically equivalent but syntactically different) require analysis beyond token comparison and are not currently supported.
Detection modes in detail
| Mode | What is normalized | Trade-off |
|---|---|---|
strict | Nothing — exact token match | Highest precision, lowest recall |
mild | Equivalent to strict (AST tokenization already strips whitespace/comments) | Default, good starting point |
weak | + string literal values abstracted | Catches copies with different messages/URLs |
semantic | + identifiers + numeric literals + type annotations stripped | Catches renamed-variable clones, most false positives |
fallow dupes --mode semantic
Key metrics
Duplication percentage
Fraction of total source tokens that appear in at least one clone group. Computed over the full analyzed file set, not just the groups shown by--top.
| Range | Interpretation | Action |
|---|---|---|
| 0–5% | Low duplication | No action needed |
| 5–15% | Moderate | Review the largest clone families |
| 15–30% | High | Prioritize extraction of shared modules |
| 30%+ | Very high | Likely structural issue — look for copy-pasted modules |
Duplication percentage is mode-dependent. Running with
semantic mode will always report a higher percentage than strict because more normalization means more matches. Compare percentages only across runs using the same mode.Token count and line count
Each clone group reports both token count and line count.- Tokens are language-aware units (keywords, identifiers, operators, literals). This is what the detection engine matches on.
- Lines are the source lines spanned by the clone. Useful for estimating refactoring effort.
Instance count
The number of locations where the same code appears. A clone group with 5 instances means the same block was copied to 5 places — fixing a bug in the logic requires updating all 5.| Instances | Risk | Action |
|---|---|---|
| 2 | Normal, often acceptable | Extract if the code is complex or likely to change |
| 3–5 | Bug risk increases | Extract into a shared function or module |
| 5+ | High maintenance burden | Extract urgently — divergence between copies is likely |
Clone groups and families
Clone groups
A clone group is a single duplicated code block found at 2+ locations. Each location is an instance.Clone families
A clone family is when multiple clone groups involve the same files. It means those files weren’t just sharing one snippet — they were likely copy-pasted from each other entirely.| Pattern | What it tells you | Refactoring strategy |
|---|---|---|
| Single clone group | One block was copied | Extract that block into a shared function |
| Family within one file | Multiple self-clones | Extract the repeated pattern into a helper |
| Family across 2 files | Files were copy-pasted | Merge into a shared module with configuration |
| Family across a directory | Template-based duplication | Replace the template with a parameterized generator |
When duplication is acceptable
Not all duplication should be eliminated. Context matters.- Test files: Test cases often repeat setup code intentionally. Abstracting test setup can make tests harder to understand and debug. Use
--productionto exclude test/story/dev files entirely, or usedupes.ignorepatterns for more granular control. - Generated code: Codegen output (GraphQL, Prisma, OpenAPI) is inherently duplicative. Exclude with
dupes.ignore. - Small clones (< 10 lines): Very small clones are often idiomatic patterns (error handling, guard clauses) rather than meaningful duplication.
- Cross-language ports: If you maintain both
.tsand.jsversions intentionally, use--skip-localto focus on cross-directory duplicates instead.
Interpreting --threshold results
The --threshold flag sets a maximum allowed duplication percentage. If the project exceeds the threshold, fallow exits with code 1.
fallow dupes --threshold 15
--baseline for even more gradual adoption.
How suffix-array detection works
Fallow concatenates all token streams into one sequence, builds a with , and scans for repeated subsequences above the minimum length. This runs in O(n log n) time — no pairwise file comparison needed.Research: suffix-array clone detection
Research: suffix-array clone detection
Token-based clone detection was pioneered by Baker (1995). Suffix-array approaches for scalable clone detection were developed by Li et al. (2006). Fallow’s approach is closest to the suffix-array method, with normalization levels corresponding to clone types.
Limitations
- Type-3 clones (gapped clones with inserted/deleted lines) are not detected. If someone copied a function and added a few lines in the middle, fallow will report the matching portions as separate smaller clones rather than one large near-miss.
- Type-4 clones (semantically equivalent but syntactically different) are not detected. Two functions that do the same thing but are written differently will not be flagged.
- Cross-file-type detection only works between
.tsand.jsfiles (via--cross-language). Other language pairs are not supported. - Minimum size thresholds mean very small clones (below
--min-tokens/--min-lines) are invisible. This is intentional — small clones are usually idiomatic patterns. - Semantic mode false positives: Normalizing identifiers and literals can match code that is structurally similar but semantically unrelated. Review semantic-mode results before acting.
JSON _meta object
When --explain is passed (or via MCP), the JSON output includes a _meta object: