Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.fallow.tools/llms.txt

Use this file to discover all available pages before exploring further.

Codebase health analysis goes beyond finding dead code. While unused exports and unreachable files tell you what to delete, health metrics tell you what to fix: which functions are too complex, which files change too often, and where untested complexity creates the most risk. This page covers what fallow health reports, when to act on the numbers, and where the research comes from.
Pass --explain to any command with --format json to include metric definitions directly in the JSON output as a _meta object. The MCP server always includes _meta automatically.

Complexity metrics

These metrics appear in fallow health output, both per-function (findings) and per-file (file scores).

Cyclomatic complexity

The number of linearly independent paths through a function’s control flow graph. For structured programs (like JS/TS), this simplifies to 1 + the number of decision points (if/else, switch cases, loops, ternary ?:, logical &&/||, catch).
RangeInterpretationAction
1–10Simple, easy to test exhaustivelyNo action needed
11–20Moderate, most functions should stay hereReview if growing
21–50High, hard to test all pathsSplit into smaller functions
50+Very high, testing all paths is impracticalRefactor urgently
Default threshold: 20
Introduced by Thomas McCabe in “A Complexity Measure” (1976). The practical upper bound of 20 comes from NIST SP 500-235 (Watson & McCabe, 1996).

Cognitive complexity

How hard a function is to understand when reading top-to-bottom. Unlike cyclomatic complexity (which counts paths for testing), cognitive complexity measures comprehension difficulty. It penalizes nesting depth, break/continue to labels, recursion, and sequences of logical operators.
RangeInterpretationAction
0–7Easy to understand at a glanceNo action needed
8–15Moderate, may benefit from extractionExtract helpers if growing
15+Hard to follow, likely mixing concernsRefactor: extract, simplify, or decompose
Default threshold: 15 (matches SonarSource’s default rule). The 0–7 “easy” range is fallow’s guideline based on common industry practice.
Developed by G. Ann Campbell at SonarSource. Full specification: SonarSource white paper.

Complexity density

Total cyclomatic complexity divided by lines of code. Normalizing by file size lets you compare fairly: a 500-line file with complexity 50 (density 0.1) vs. a 10-line file with complexity 10 (density 1.0).
ValueInterpretationAction
< 0.3Low, data files, configs, simple modulesNo action needed
0.3–1.0Moderate, normal application codeReview the densest functions
> 1.0Very dense, nearly every line is a decision pointRefactor: extract logic, simplify conditions

Cyclomatic vs cognitive: a code example

These two functions illustrate why both metrics matter:
// Cyclomatic: 6 (one per case)
// Cognitive: 1 (flat switch is easy to read)
function getStatusLabel(status: string): string {
  switch (status) {
    case "pending": return "Pending";
    case "active": return "Active";
    case "paused": return "Paused";
    case "cancelled": return "Cancelled";
    case "completed": return "Done";
    default: return "Unknown";
  }
}
The switch has higher cyclomatic complexity (more paths to test) but is trivially readable. The nested if chain has lower cyclomatic complexity but is harder to follow. Cognitive complexity captures that difference.

File health scores

Per-file scores from fallow health --file-scores. Zero-function files (barrel files, re-export files) are excluded.

Maintainability Index (MI)

Fallow uses a simplified Maintainability Index adapted for JavaScript/TypeScript module graphs.
fan_out_penalty = min(ln(fan_out + 1) × 4, 15)
MI = 100 - (complexity_density × 30) - (dead_code_ratio × 20) - fan_out_penalty
Clamped to [0, 100]. Higher is better.
RangeInterpretationAction
70–100Good maintainabilityMonitor
40–70ModerateReview periodically, address if declining
0–40PoorPrioritize for refactoring
fallow health --file-scores --top 5
● File health scores (3 files)

   57.5    src/helpers/errorUtil.ts
             48 LOC    1 fan-in    0 fan-out  100% dead  0.75 density

   76.2    src/helpers/util.ts
            320 LOC   35 fan-in    0 fan-out   86% dead  0.22 density

   90.5    src/types.ts
            154 LOC   42 fan-in    3 fan-out    0% dead  0.20 density

  Composite file quality scores based on complexity, coupling, and dead code — https://docs.fallow.tools/explanations/health#file-health-scores
The original MI from Oman and Hagemeister (1992) combines Halstead Volume, cyclomatic complexity, lines of code, and comment percentage. Microsoft adapted it to a 0–100 scale for Visual Studio.Fallow’s variant makes three changes for the JS/TS ecosystem:
  1. Replaces Halstead Volume with complexity density. Halstead metrics require full type resolution, which fallow doesn’t have (syntactic analysis only). Complexity density captures the same “how complex per unit of code” signal.
  2. Adds dead code ratio. Unused exports are a JS/TS-specific maintenance burden that the classic MI doesn’t account for.
  3. Adds fan-out coupling penalty. High import counts are a strong signal of maintenance difficulty in module-based codebases. The penalty uses logarithmic scaling (ln(fan_out + 1) × 4, capped at 15 points), reflecting diminishing marginal risk per additional import.
Because the formula differs, fallow’s MI scores are not directly comparable to Visual Studio or SonarQube MI scores.

Fan-in, fan-out, and dead code ratio

Fan-in: number of files that import this file. High fan-in = high blast radius when changed. Fan-out: number of files this file imports. High fan-out = high coupling and change propagation risk. Dead code ratio: fraction of value exports with zero references (0-1). Type-only exports (interfaces, type aliases) are excluded.
MetricSignalAction
Fan-in > 20Many dependents, critical fileExtra review before changes
Fan-out > 15Many imports, high couplingConsider splitting into smaller modules
Dead code = 0All exports usedNo action needed
Dead code > 0.3Significant unused codeRun fallow fix to auto-remove
Fan-in/fan-out coupling metrics originate from Henry and Kafura (1981). Later refined in the Chidamber & Kemerer OO metrics suite (1994), where Coupling Between Objects (CBO) captures the same idea for class-level dependencies.

Coupling concentration

Two vital signs metrics summarize coupling at the project level:
MetricDescription
p95_fan_in95th percentile fan-in across all files
coupling_high_pctPercentage of files with fan-in above the effective threshold
The effective threshold is max(p95_fan_in, 10). The floor of 10 prevents false positives in small projects where even moderate fan-in is normal.

Risk profiles

Risk profiles show what percentage of functions fall into each risk bin. Two profiles are reported as vital signs: Unit size (function length in lines of code):
BinLOC range
Low risk1–15
Medium risk16–30
High risk31–60
Very high risk> 60
Unit interfacing (function parameter count):
BinParameters
Low risk0–2
Medium risk3–4
High risk5–6
Very high risk7+
Both profiles appear as unit_size_profile and unit_interfacing_profile in the vital signs JSON, each containing low_risk, medium_risk, high_risk, and very_high_risk percentage values. When the unit size profile shows >= 3% of functions in the very-high-risk bin (> 60 LOC), the human output includes a Large functions drill-down listing the specific oversized functions with file path, function name, line number, and LOC count. In JSON output, these appear as a large_functions array on the health report.
Risk profiles follow the SIG/TÜViT quality model, which benchmarks codebases by the distribution of code across risk categories rather than by averages. A codebase where 95% of functions are low-risk is healthier than one with the same average complexity but a fat tail of very-high-risk functions.

Untested complexity risk

Complex code without test coverage is the most likely source of production bugs. Fallow quantifies this risk per function using a score based on the CRAP metric (Change Risk Anti-Patterns), originally developed by Alberto Savoia and Bob Evans at Agitar Labs in 2007. The score appears in fallow health --file-scores output and JSON.

How the score is computed

Fallow uses the canonical CRAP formula:
score = CC^2 × (1 - cov/100)^3 + CC
Where cov is per-function coverage (0–100). Fallow obtains that value in one of three ways:
Model (coverage_model)WhenHow cov is determined
static_estimatedDefaultGraph-based per-function estimate: directly test-referenced = 85%, transitively test-reachable = 40%, not test-reachable = 0%.
istanbul--coverage <coverage-final.json> (auto-detected from coverage/ and .nyc_output/, or via FALLOW_COVERAGE)Real per-function statement coverage produced by Jest, Vitest, c8, or nyc.
static_binaryLegacy, snapshot/JSON compatibility onlyWhole-file binary: test-reachable file = 100% (score = CC), untested file = 0% (score = CC^2 + CC). Superseded by static_estimated.
For untested code the quadratic term makes complexity hurt fast. Under the default static_estimated model, a not-test-reachable function with cyclomatic complexity 5 scores 5^2 × 1 + 5 = 30, which is exactly the default flag threshold. For execution evidence from real traffic (hot/cold paths, runtime-weighted dead-code verdicts), see Runtime coverage. Runtime coverage is a paid Fallow Runtime feature and accepts V8 or Istanbul runtime data via --runtime-coverage; it merges on top of whichever static coverage_model is active. Default threshold: 30 (functions scoring at or above this value are flagged)

Interpreting the numbers

ScoreInterpretationAction
< 30Acceptable riskNo action needed
30-99Moderate risk, complex untested codeAdd test coverage or reduce complexity
100-999High riskPrioritize: add tests, then simplify
> 999Extreme risk (shown as “>999” in terminal)Refactor urgently, this code is both complex and invisible to tests
In human output, scores above 999 are capped to >999 for readability. JSON output always contains the exact value.

File-level aggregation

Each file reports two aggregated values:
FieldMeaning
crap_maxHighest per-function score in the file
crap_above_thresholdCount of functions scoring >= 30
The active model is reported as coverage_model on the health summary (static_estimated, istanbul, or the legacy static_binary).

Refactoring targets

The AddTestCoverage refactoring target fires when both conditions are true:
  • crap_above_threshold >= 2 (multiple risky functions, not just one outlier)
  • complexity_density > 0.3 (the file is genuinely complex, not just large)
This combination identifies files where adding test coverage will reduce the most risk per effort spent.

Suppression

To exclude a file from untested complexity risk reporting:
// fallow-ignore-file coverage-gaps
This suppresses all coverage-related findings for the file, including the CRAP-based scores. To suppress individual functions from complexity findings (cyclomatic and cognitive thresholds):
// fallow-ignore-next-line complexity
function* parseCsv(text) { /* ... */ }
See Inline suppression for all suppressible issue types.
The CRAP formula needs a per-function coverage percentage:
CRAP(m) = CC^2 × (1 - cov/100)^3 + CC
A fully tested function (cov = 100) always scores exactly CC, regardless of complexity. A completely untested function (cov = 0) scores CC^2 + CC.Without coverage input, the default static_estimated model derives cov from the dependency graph: 85% for directly test-referenced exports, 40% for transitively test-reachable code, 0% for code with no path from any test file. This is an approximation of test coverage:
  • False negatives: A file imported by tests but with functions that are never actually exercised will score lower (look healthier) than it should.
  • False positives: A file referenced only from non-test code can score worse than it deserves if it is in fact unit-tested through indirect paths.
For exact per-function scores, pass --coverage <coverage-final.json> (Jest, Vitest, c8, or nyc). Fallow auto-detects coverage/coverage-final.json and .nyc_output/coverage-final.json, or set FALLOW_COVERAGE to point at a custom path. The coverage_model field flips from static_estimated to istanbul when real coverage is in use.The legacy static_binary model (whole-file 0%/100% based on test reachability) is superseded by static_estimated and is retained only for snapshot/JSON backwards compatibility.
The CRAP metric was introduced by Alberto Savoia and Bob Evans at Agitar Labs. Original presentation: “CRAP4J” (2007). The core insight is that complexity alone is not dangerous if the code is well-tested, and simple code is low-risk even without tests. Risk concentrates where complexity and missing coverage overlap.

Hotspot metrics

Hotspots are files that are both complex and frequently changing. Bugs concentrate at this intersection, and refactoring here yields the highest return. Available via fallow health --hotspots.
The core insight: Complex code that never changes is stable. Simple code that changes often is manageable. Complex code that changes often is where defects concentrate.

Hotspot score

normalized_churn = weighted_commits / max_weighted_commits   (0..1)
normalized_complexity = complexity_density / max_density      (0..1)
score = normalized_churn × normalized_complexity × 100       (0..100)
The multiplicative relationship means a file needs both high churn AND high complexity to score highly. Either dimension alone produces a low score.
ScoreInterpretationAction
70–100Critical hotspotPrioritize refactoring in next sprint
40–70Moderate hotspotSchedule review, reduce complexity
0–40Low riskMonitor
fallow health --hotspots --top 3
● Hotspots (3 files, since 6 months)

   72.1 ▲  src/types.ts
           47 commits   2739 churn  0.20 density  42 fan-in  ▲ accelerating

   38.4 ─  src/helpers/util.ts
           23 commits    610 churn  0.22 density  35 fan-in  ─ stable

   12.7 ▼  src/locales/en.ts
            8 commits    177 churn  0.35 density   2 fan-in  ▼ cooling

  Files with high churn and high complexity — https://docs.fallow.tools/explanations/health#hotspot-metrics

Weighted commits

Recency-weighted commit count using exponential decay with a 90-day half-life:
  • A commit yesterday contributes ~1.0
  • A commit 90 days ago contributes ~0.5
  • A commit 180 days ago contributes ~0.25
Recency-weighted change metrics were introduced by Graves et al. (2000). The empirical link between churn and defect density was established by Nagappan and Ball (2005) at Microsoft.

Churn trend

Compares commit frequency in the first half vs second half of the analysis window:
TrendMeaningAction
acceleratingRecent > 1.5× older halfHigh priority if complexity is also high
stableBalanced frequencyNormal, monitor score
coolingRecent < 0.67× older halfLower priority, stabilizing
Pioneered by Michael Feathers and systematized by Adam Tornhill in Your Code as a Crime Scene (2015) and Software Design X-Rays (2018).

Refactoring targets

Refactoring targets are ranked, actionable recommendations produced by fallow health --targets. They rank files by refactoring priority using the metrics above.
Quick start: Run fallow health --targets for a prioritized refactoring list. Filter by effort: low for the fastest improvements. Save a baseline with --save-baseline and pass --baseline in CI to block new regressions. Update the baseline as you fix targets.
Targets are synthesis, not data. The other sections show measurements. Targets tell you what to do about them.

Priority score

priority = min(complexity_density, 1) × 30
         + hotspot_boost × 25            (hotspot_score / 100 if in hotspots, else 0)
         + dead_code_ratio × 20
         + min(fan_in / 20, 1) × 15
         + min(fan_out / 30, 1) × 10
All inputs are normalized to [0, 1] so each weight is a true percentage share. The formula intentionally avoids the Maintainability Index to prevent double-counting (MI already incorporates density and dead code).
ScoreInterpretationAction
70–100High priorityAddress in the current sprint
40–70ModerateSchedule for upcoming work
0–40Lower priorityMonitor, address opportunistically

Effort estimation

Every target includes an effort estimate (low, medium, or high) based on three file-level signals:
EffortCriteria
low< 100 lines AND \u 3 functions AND < 5 importers
high\u 500 lines, OR \u 20 importers, OR (\u 15 functions AND density > 0.5)
mediumEverything else
Effort uses OR-semantics for “high” (any single criterion triggers it) and AND-semantics for “low” (all three must be true). A 50-line file with 25 importers is still “high” because changes ripple widely. In human output, effort appears after the category label separated by \u{00b7}:
dead code \u{00b7} low    Remove 12 unused exports...
complexity \u{00b7} high  Extract 3 functions with cognitive complexity > 30...

Evidence

For three recommendation categories, fallow provides structured evidence linking the target back to specific analysis data. This lets agents and scripts act on recommendations without a second tool call.
CategoryEvidence fieldsExample
remove_dead_codeunused_exports: list of export names["assertEqual", "assertIs", "assertNever"]
extract_complex_functionscomplex_functions: list of {name, line, cognitive}[{"name": "processData", "line": 45, "cognitive": 113}]
break_circular_dependencycycle_path: list of file paths forming the cycle["src/a.ts", "src/b.ts", "src/c.ts"]
All other categoriesevidence omitted from JSON\u
Evidence is designed for machine consumption. In human output, the recommendation text already incorporates the key evidence (e.g., “Remove 12 unused exports” or “Extract processData (cognitive: 113)”).

Baselines

Use --save-baseline and --baseline with the health command to track refactoring progress over time. When a baseline is active, only new targets (not present in the baseline) are reported.
# Save current targets as the baseline
fallow health --targets --save-baseline fallow-baselines/health.json

# On subsequent runs, only show new targets
fallow health --targets --baseline fallow-baselines/health.json
Baseline keys use the format relative_path:category_label (e.g., src/utils.ts:dead code). A target is considered “known” if the same file + category combination exists in the baseline, regardless of whether its priority score or effort level has changed. This is useful in CI to prevent new tech debt from landing while allowing existing targets to be addressed incrementally.

Recommendation categories

Each target is assigned a category based on which rule matches first (evaluated in priority order):
CategoryDisplay labelTriggerWhat it means
Urgent churn + complexitychurn+complexityHotspot score ≥ 50, accelerating trend, density > 0.5Actively-changing file with growing complexity, highest urgency
Break circular dependencycircular depFile in an import cycle with fan-in ≥ 5Changes cascade through the cycle to many dependents
Split high-impact filehigh impactFan-in ≥ 20 (or ≥ 10 with ≥ 5 functions) and density > 0.3High blast radius amplifies every bug introduced
Remove dead codedead codeDead code ratio ≥ 0.5 with ≥ 3 value exportsMajority of exports are unused; reduce surface area
Extract complex functionscomplexityTop cognitive complexity ≥ 30Contains functions that are very hard to understand
Reduce couplingcouplingFan-out ≥ 15 and MI < 60 (excluding entry points)Too many imports reduce testability
A file can trigger multiple rules, but only the highest-priority match determines the category. All contributing factors are still collected and shown in JSON output.
fallow health --targets --top 5
● Refactoring targets (5)

  92.3    src/core/processor.ts
          churn+complexity · high  Actively-changing file with growing complexity — stabilize before adding features

  71.2    src/utils/helpers.ts
          high impact · high  Split high-impact file — 15 dependents amplify every change

  55.8    src/legacy/old-api.ts
          dead code · low  Remove 8 unused exports to reduce surface area (73% dead)

  52.0    src/graph/resolver.ts
          circular dep · medium  Break import cycle — 8 files depend on this, changes cascade through the cycle

  41.8    src/components/Dashboard.tsx
          complexity · high  Extract renderMetrics (cognitive: 45) and renderFilters (cognitive: 32) into smaller functions

  Prioritized refactoring recommendations based on complexity, churn, and coupling signals — https://docs.fallow.tools/explanations/health#refactoring-targets

JSON structure

With --format json, each target includes structured data for programmatic consumption:
{
  "targets": [
    {
      "path": "src/core/processor.ts",
      "priority": 92.3,
      "recommendation": "Actively-changing file with growing complexity — stabilize before adding features",
      "category": "urgent_churn_complexity",
      "effort": "high",
      "factors": [
        {
          "metric": "complexity_density",
          "value": 0.67,
          "threshold": 0.3,
          "detail": "density 0.67 exceeds 0.3"
        },
        {
          "metric": "fan_in",
          "value": 12,
          "threshold": 10,
          "detail": "12 files depend on this"
        }
      ]
    },
    {
      "path": "src/helpers/util.ts",
      "priority": 38.8,
      "recommendation": "Remove 12 unused exports to reduce surface area (86% dead)",
      "category": "remove_dead_code",
      "effort": "low",
      "factors": [
        {
          "metric": "dead_code_ratio",
          "value": 0.86,
          "threshold": 0.5,
          "detail": "12 unused of 14 value exports (86%)"
        }
      ],
      "evidence": {
        "unused_exports": ["assertEqual", "assertIs", "assertNever", "oldConfig", "legacyHelper"]
      }
    }
  ]
}
Each factor includes raw value and threshold for programmatic use. The evidence field provides actionable detail for categories that support it. AI agents can use evidence.unused_exports to generate removal PRs without a second tool call. Categories without evidence omit the field entirely.

Health score

The health score is a single 0–100 number summarizing overall codebase quality. Available via fallow health --score.
score = 100 - sum(penalties)
Clamped to [0, 100]. Each penalty targets a different quality dimension:
PenaltyFormulaMax
Dead filesmin(dead_file_pct × 0.2, 15)15
Dead exportsmin(dead_export_pct × 0.2, 15)15
Complexitymin(critical_complexity_pct × 4, 20)20
P90 complexity0 for formula v2 runs; older snapshots fall back to min(max(0, p90_cyclomatic − 10), 10)10
Maintainabilitymin(maintainability_low_pct × 1.5, 15)15
Hotspotsmin(hotspot_top_pct_count / ceil(total_files × 0.01) × 10, 10)10
Unused depsmin(unused_deps_per_k_files × 0.5, 25)25
Circular depsmin(circular_deps_per_k_files × 0.5, 25)25
Unit sizemin(functions_over_60_loc_per_k × 0.5, 10)10
Couplingmin(coupling_high_pct × 0.5, 5)5
Duplicationmin(max(0, duplication_pct − 5) × 1.0, 10)10
health_score.formula_version identifies the formula used. Version 2 uses scale-invariant density and tail metrics so large monorepos are not diluted by many healthy files. Older snapshots that lack the v2 fields fall back to the previous average, p90, and raw-count aggregators. The unit size penalty tracks functions above 60 LOC per 1,000 functions. The coupling penalty uses the percentage of high fan-in files. The duplication penalty kicks in when more than 5% of lines are duplicated; --score automatically runs duplication analysis to populate this metric. Letter grades: A (score >= 85), B (70–84), C (55–69), D (40–54), F (below 40). Penalty fields are null in JSON when the corresponding analysis didn’t run. --score computes the score and automatically runs duplication analysis; add --hotspots (or combine --score --targets) when the score should include the churn-backed hotspot penalty.

Combining health with dead code analysis

Health metrics are most powerful when combined with fallow’s other analyses. Running fallow (all analyses) or fallow health alongside fallow dead-code reveals patterns that neither analysis shows alone:
  • Dead code in complex files: A file with high cyclomatic complexity and 70% unused exports is a strong candidate for deletion rather than refactoring
  • Hotspots with circular dependencies: A frequently-changed file that sits in an import cycle amplifies every change through the cycle to its dependents
  • Duplicated complex functions: Functions flagged as complex that also appear as clone instances in fallow dupes are refactoring candidates where extraction yields the most benefit
Use fallow health --targets for prioritized recommendations that synthesize all these signals into a ranked action list.

Limitations

Metrics are signals, not verdicts. Be aware of these edge cases.
  • Generated files (GraphQL codegen, Prisma client, OpenAPI types) may have high complexity density but shouldn’t be refactored. Use health.ignore patterns to exclude them.
  • Test files with many it()/test() blocks can have high cyclomatic complexity. This is usually fine. Test suites should cover many paths.
  • Config files (Webpack, ESLint) may appear as hotspots due to frequent changes during setup. Their churn typically cools after initial configuration.
  • MI scores are not portable. Fallow’s formula differs from Visual Studio and SonarQube, so scores are not directly comparable across tools.
  • Hotspot scores are project-relative. A score of 80 means “worst in your project,” not “objectively bad.” A well-maintained project may have a top score of 30.
  • Fan-in/out counts modules, not imports. import { a, b, c } from './utils' counts as 1 edge, not 3.

JSON _meta object

When --explain is passed (or via MCP), each command’s JSON output includes a _meta object:
{
  "schema_version": 3,
  "_meta": {
    "docs": "https://docs.fallow.tools/explanations/health",
    "metrics": {
      "maintainability_index": {
        "name": "Maintainability Index",
        "description": "Composite score: 100 - (complexity_density × 30) - ...",
        "range": "[0, 100]",
        "interpretation": "higher is better; <40 poor, 40–70 moderate, >70 good"
      }
    }
  }
}
AI agents and CI systems can use this to interpret metric values without consulting external documentation.