Skip to main content
This page covers what fallow dupes reports, how to read the output, and when to act.
Pass --explain to any command with --format json to include metric definitions directly in the JSON output as a _meta object. The MCP server always includes _meta automatically.

Clone types

Fallow classifies clones using the standard taxonomy from clone detection research.
Clone typeWhat matchesDetection mode
Type-1Exact token sequences (whitespace/comments already stripped)strict, mild
Type-2Same structure, but identifiers and/or literals differweak, semantic
As you move from strict to semantic, recall increases (more clones found) but precision decreases (more potential false positives).
The Type-1 through Type-4 clone taxonomy was surveyed by Roy, Cordy, and Koschke (2009). Fallow detects Type-1 and Type-2 clones. Type-3 (gapped clones with inserted/deleted statements) and Type-4 (semantically equivalent but syntactically different) require analysis beyond token comparison and are not currently supported.

Detection modes in detail

ModeWhat is normalizedTrade-off
strictNothing — exact token matchHighest precision, lowest recall
mildEquivalent to strict (AST tokenization already strips whitespace/comments)Default, good starting point
weak+ string literal values abstractedCatches copies with different messages/URLs
semantic+ identifiers + numeric literals + type annotations strippedCatches renamed-variable clones, most false positives
fallow dupes --mode semantic
● Clone group (196 lines, 2 instances) [semantic match]
  src/lib/dutch-holidays.ts:193-388
  src/lib/dutch-holidays.ts:389-584
  Renamed: holidays2024→holidays2025, year2024→year2025

Key metrics

Duplication percentage

Fraction of total source tokens that appear in at least one clone group. Computed over the full analyzed file set, not just the groups shown by --top.
RangeInterpretationAction
0–5%Low duplicationNo action needed
5–15%ModerateReview the largest clone families
15–30%HighPrioritize extraction of shared modules
30%+Very highLikely structural issue — look for copy-pasted modules
Duplication percentage is mode-dependent. Running with semantic mode will always report a higher percentage than strict because more normalization means more matches. Compare percentages only across runs using the same mode.

Token count and line count

Each clone group reports both token count and line count.
  • Tokens are language-aware units (keywords, identifiers, operators, literals). This is what the detection engine matches on.
  • Lines are the source lines spanned by the clone. Useful for estimating refactoring effort.
Larger clones have higher refactoring value. A 200-line clone group is worth extracting; a 6-line one probably isn’t.

Instance count

The number of locations where the same code appears. A clone group with 5 instances means the same block was copied to 5 places — fixing a bug in the logic requires updating all 5.
InstancesRiskAction
2Normal, often acceptableExtract if the code is complex or likely to change
3–5Bug risk increasesExtract into a shared function or module
5+High maintenance burdenExtract urgently — divergence between copies is likely

Clone groups and families

Clone groups

A clone group is a single duplicated code block found at 2+ locations. Each location is an instance.
● Clone group 1 (8 lines, 2 instances)
  src/validators/userValidator.ts:12-19
  src/validators/orderValidator.ts:8-15
This means one block of 8 lines appears in both files. Here’s what it looks like:
const errors: string[] = [];
for (const [key, rule] of Object.entries(rules)) {
  const value = data[key];
  if (rule.required && (value === undefined || value === null)) {
    errors.push(`${key} is required`);
  }
}
return errors;
One clone group = one duplicated block. But the same two files may share more than one block:
● Clone group 2 (6 lines, 2 instances)
  src/validators/userValidator.ts:24-29
  src/validators/orderValidator.ts:20-25

● Clone group 3 (5 lines, 2 instances)
  src/validators/userValidator.ts:35-39
  src/validators/orderValidator.ts:31-35
Three separate clone groups, all between the same two files. That pattern is a clone family.

Clone families

A clone family is when multiple clone groups involve the same files. It means those files weren’t just sharing one snippet — they were likely copy-pasted from each other entirely.
Family: src/validators/userValidator.ts ↔ src/validators/orderValidator.ts
  3 clone groups, 19 duplicated lines
  → Extract shared validation module
This changes the refactoring approach. Individual clone groups suggest extracting a function. A clone family suggests the files themselves need to be merged or restructured.
PatternWhat it tells youRefactoring strategy
Single clone groupOne block was copiedExtract that block into a shared function
Family within one fileMultiple self-clonesExtract the repeated pattern into a helper
Family across 2 filesFiles were copy-pastedMerge into a shared module with configuration
Family across a directoryTemplate-based duplicationReplace the template with a parameterized generator

When duplication is acceptable

Not all duplication should be eliminated. Context matters.
  • Test files: Test cases often repeat setup code intentionally. Abstracting test setup can make tests harder to understand and debug. Use --production to exclude test/story/dev files entirely, or use dupes.ignore patterns for more granular control.
  • Generated code: Codegen output (GraphQL, Prisma, OpenAPI) is inherently duplicative. Exclude with dupes.ignore.
  • Small clones (< 10 lines): Very small clones are often idiomatic patterns (error handling, guard clauses) rather than meaningful duplication.
  • Cross-language ports: If you maintain both .ts and .js versions intentionally, use --skip-local to focus on cross-directory duplicates instead.
Premature abstraction is worse than duplication. If the duplicated code isn’t changing or isn’t causing bugs, it may not be worth extracting. Focus on clones that are both large and in actively changing files — use fallow health --hotspots to cross-reference.

Interpreting --threshold results

The --threshold flag sets a maximum allowed duplication percentage. If the project exceeds the threshold, fallow exits with code 1.
fallow dupes --threshold 15
Duplication: 19.4% (27,255 duplicated lines across 398 files, exceeds 15% threshold)
Found 1,184 clone groups, 2,959 instances (0.23s) ✗
Start with a threshold above your current level, then ratchet it down over time. Combine with --baseline for even more gradual adoption.

How suffix-array detection works

Fallow concatenates all token streams into one sequence, builds a with , and scans for repeated subsequences above the minimum length. This runs in O(n log n) time — no pairwise file comparison needed.
Token-based clone detection was pioneered by Baker (1995). Suffix-array approaches for scalable clone detection were developed by Li et al. (2006). Fallow’s approach is closest to the suffix-array method, with normalization levels corresponding to clone types.

Limitations

  • Type-3 clones (gapped clones with inserted/deleted lines) are not detected. If someone copied a function and added a few lines in the middle, fallow will report the matching portions as separate smaller clones rather than one large near-miss.
  • Type-4 clones (semantically equivalent but syntactically different) are not detected. Two functions that do the same thing but are written differently will not be flagged.
  • Cross-file-type detection only works between .ts and .js files (via --cross-language). Other language pairs are not supported.
  • Minimum size thresholds mean very small clones (below --min-tokens / --min-lines) are invisible. This is intentional — small clones are usually idiomatic patterns.
  • Semantic mode false positives: Normalizing identifiers and literals can match code that is structurally similar but semantically unrelated. Review semantic-mode results before acting.

JSON _meta object

When --explain is passed (or via MCP), the JSON output includes a _meta object:
{
  "schema_version": 3,
  "_meta": {
    "docs": "https://docs.fallow.tools/explanations/duplication",
    "metrics": {
      "duplication_percentage": {
        "name": "Duplication percentage",
        "description": "Fraction of total source tokens in clone groups",
        "range": "[0, 100]",
        "interpretation": "lower is better; <5% low, 5–15% moderate, >15% high"
      },
      "clone_group": {
        "name": "Clone group",
        "description": "Set of 2+ code fragments with identical normalized token sequences"
      },
      "clone_family": {
        "name": "Clone family",
        "description": "Multiple clone groups sharing the same file set, indicating systematic duplication"
      }
    }
  }
}
AI agents and CI systems can use this to interpret results without consulting external documentation.