> ## Documentation Index
> Fetch the complete documentation index at: https://docs.fallow.tools/llms.txt
> Use this file to discover all available pages before exploring further.

# Health explained

> What each metric in fallow's health output means, how it's computed, and how to interpret the values. Complexity analysis, hotspot detection, and refactoring targets integrated with dead-code and duplication findings.

Codebase health analysis goes beyond finding dead code. While unused exports and unreachable files tell you what to delete, health metrics tell you what to fix: which functions are too complex, which files change too often, and where untested complexity creates the most risk.

This page covers what `fallow health` reports, when to act on the numbers, and where the research comes from.

<Tip>
  Pass `--explain` to any command with `--format json` to include metric definitions directly in the JSON output as a `_meta` object. The MCP server always includes `_meta` automatically.
</Tip>

## Complexity metrics

These metrics appear in [`fallow health`](/cli/health) output, both per-function (findings) and per-file (file scores).

### Cyclomatic complexity

The number of linearly independent paths through a function's control flow graph. For structured programs (like JS/TS), this simplifies to 1 + the number of decision points (`if`/`else`, `switch` cases, loops, ternary `?:`, logical `&&`/`||`, `catch`).

| Range | Interpretation                              | Action                       |
| :---- | :------------------------------------------ | :--------------------------- |
| 1–10  | Simple, easy to test exhaustively           | No action needed             |
| 11–20 | Moderate, most functions should stay here   | Review if growing            |
| 21–50 | High, hard to test all paths                | Split into smaller functions |
| 50+   | Very high, testing all paths is impractical | Refactor urgently            |

**Default threshold:** 20

<Accordion title="Research">
  Introduced by Thomas McCabe in ["A Complexity Measure" (1976)](https://doi.org/10.1109/TSE.1976.233837). The practical upper bound of 20 comes from [NIST SP 500-235](https://www.nist.gov/publications/structured-testing-testing-methodology-using-cyclomatic-complexity-metric) (Watson & McCabe, 1996).
</Accordion>

### Cognitive complexity

How hard a function is to understand when reading top-to-bottom. Unlike cyclomatic complexity (which counts paths for *testing*), cognitive complexity measures *comprehension difficulty*. It penalizes nesting depth, `break`/`continue` to labels, recursion, and sequences of logical operators.

| Range | Interpretation                         | Action                                    |
| :---- | :------------------------------------- | :---------------------------------------- |
| 0–7   | Easy to understand at a glance         | No action needed                          |
| 8–15  | Moderate, may benefit from extraction  | Extract helpers if growing                |
| 15+   | Hard to follow, likely mixing concerns | Refactor: extract, simplify, or decompose |

**Default threshold:** 15 (matches SonarSource's default rule). The 0–7 "easy" range is fallow's guideline based on common industry practice.

<Accordion title="Research">
  Developed by G. Ann Campbell at SonarSource. Full specification: [SonarSource white paper](https://www.sonarsource.com/resources/cognitive-complexity/).
</Accordion>

### Complexity density

Total cyclomatic complexity divided by lines of code. Normalizing by file size lets you compare fairly: a 500-line file with complexity 50 (density 0.1) vs. a 10-line file with complexity 10 (density 1.0).

| Value   | Interpretation                                    | Action                                       |
| :------ | :------------------------------------------------ | :------------------------------------------- |
| \< 0.3  | Low, data files, configs, simple modules          | No action needed                             |
| 0.3–1.0 | Moderate, normal application code                 | Review the densest functions                 |
| > 1.0   | Very dense, nearly every line is a decision point | Refactor: extract logic, simplify conditions |

### Cyclomatic vs cognitive: a code example

These two functions illustrate why both metrics matter:

<CodeGroup>
  ```typescript High cyclomatic, low cognitive theme={null}
  // Cyclomatic: 6 (one per case)
  // Cognitive: 1 (flat switch is easy to read)
  function getStatusLabel(status: string): string {
    switch (status) {
      case "pending": return "Pending";
      case "active": return "Active";
      case "paused": return "Paused";
      case "cancelled": return "Cancelled";
      case "completed": return "Done";
      default: return "Unknown";
    }
  }
  ```

  ```typescript Low cyclomatic, high cognitive theme={null}
  // Cyclomatic: 4 (one per if)
  // Cognitive: 9 (nesting adds +2, +3 increments)
  function processOrder(order: Order): string {
    if (order.items.length > 0) {            // +1
      if (order.customer.verified) {          // +2 (nesting)
        if (order.total > order.limit) {      // +3 (nesting)
          return "requires_approval";
        }
      } else {                                // +1
        return "unverified";
      }
    }
    return "empty";
  }
  ```
</CodeGroup>

The `switch` has higher cyclomatic complexity (more paths to test) but is trivially readable. The nested `if` chain has lower cyclomatic complexity but is harder to follow. Cognitive complexity captures that difference.

## File health scores

Per-file scores from [`fallow health --file-scores`](/cli/health). Zero-function files (barrel files, re-export files) are excluded.

### Maintainability Index (MI)

Fallow uses a **simplified Maintainability Index** adapted for JavaScript/TypeScript module graphs.

```
fan_out_penalty = min(ln(fan_out + 1) × 4, 15)
MI = 100 - (complexity_density × 30) - (dead_code_ratio × 20) - fan_out_penalty
```

Clamped to \[0, 100]. Higher is better.

| Range  | Interpretation       | Action                                    |
| :----- | :------------------- | :---------------------------------------- |
| 70–100 | Good maintainability | Monitor                                   |
| 40–70  | Moderate             | Review periodically, address if declining |
| 0–40   | Poor                 | Prioritize for refactoring                |

```text fallow health --file-scores --top 5 theme={null}
● File health scores (3 files)

   57.5    src/helpers/errorUtil.ts
             48 LOC    1 fan-in    0 fan-out  100% dead  0.75 density

   76.2    src/helpers/util.ts
            320 LOC   35 fan-in    0 fan-out   86% dead  0.22 density

   90.5    src/types.ts
            154 LOC   42 fan-in    3 fan-out    0% dead  0.20 density

  Composite file quality scores based on complexity, coupling, and dead code — https://docs.fallow.tools/explanations/health#file-health-scores
```

<Accordion title="How fallow's MI relates to the classic Maintainability Index">
  The original MI from [Oman and Hagemeister (1992)](https://doi.org/10.1109/ICSM.1992.242525) combines Halstead Volume, cyclomatic complexity, lines of code, and comment percentage. [Microsoft adapted it](https://learn.microsoft.com/en-us/visualstudio/code-quality/code-metrics-maintainability-index-range-and-meaning) to a 0–100 scale for Visual Studio.

  Fallow's variant makes three changes for the JS/TS ecosystem:

  1. **Replaces Halstead Volume with complexity density.** Halstead metrics require full type resolution, which fallow doesn't have (syntactic analysis only). Complexity density captures the same "how complex per unit of code" signal.
  2. **Adds dead code ratio.** Unused exports are a JS/TS-specific maintenance burden that the classic MI doesn't account for.
  3. **Adds fan-out coupling penalty.** High import counts are a strong signal of maintenance difficulty in module-based codebases. The penalty uses logarithmic scaling (`ln(fan_out + 1) × 4`, capped at 15 points), reflecting diminishing marginal risk per additional import.

  Because the formula differs, fallow's MI scores are **not directly comparable** to Visual Studio or SonarQube MI scores.
</Accordion>

### Fan-in, fan-out, and dead code ratio

**Fan-in**: number of files that import this file. High fan-in = high blast radius when changed.

**Fan-out**: number of files this file imports. High fan-out = high coupling and change propagation risk.

**Dead code ratio**: fraction of value exports with zero references (0-1). Type-only exports (interfaces, type aliases) are excluded.

| Metric          | Signal                         | Action                                  |
| :-------------- | :----------------------------- | :-------------------------------------- |
| Fan-in > 20     | Many dependents, critical file | Extra review before changes             |
| Fan-out > 15    | Many imports, high coupling    | Consider splitting into smaller modules |
| Dead code = 0   | All exports used               | No action needed                        |
| Dead code > 0.3 | Significant unused code        | Run `fallow fix` to auto-remove         |

<Accordion title="Research: fan-in/fan-out">
  Fan-in/fan-out coupling metrics originate from [Henry and Kafura (1981)](https://doi.org/10.1109/TSE.1981.231113). Later refined in the [Chidamber & Kemerer OO metrics suite (1994)](https://doi.org/10.1109/32.295895), where Coupling Between Objects (CBO) captures the same idea for class-level dependencies.
</Accordion>

### Coupling concentration

Two vital signs metrics summarize coupling at the project level:

| Metric              | Description                                                   |
| :------------------ | :------------------------------------------------------------ |
| `p95_fan_in`        | 95th percentile fan-in across all files                       |
| `coupling_high_pct` | Percentage of files with fan-in above the effective threshold |

The effective threshold is `max(p95_fan_in, 10)`. The floor of 10 prevents false positives in small projects where even moderate fan-in is normal.

### Risk profiles

Risk profiles show what percentage of functions fall into each risk bin. Two profiles are reported as vital signs:

**Unit size** (function length in lines of code):

| Bin            | LOC range |
| :------------- | :-------- |
| Low risk       | 1–15      |
| Medium risk    | 16–30     |
| High risk      | 31–60     |
| Very high risk | > 60      |

**Unit interfacing** (function parameter count):

| Bin            | Parameters |
| :------------- | :--------- |
| Low risk       | 0–2        |
| Medium risk    | 3–4        |
| High risk      | 5–6        |
| Very high risk | 7+         |

Both profiles appear as `unit_size_profile` and `unit_interfacing_profile` in the vital signs JSON, each containing `low_risk`, `medium_risk`, `high_risk`, and `very_high_risk` percentage values.

When the unit size profile shows >= 3% of functions in the very-high-risk bin (> 60 LOC), the human output includes a **Large functions** drill-down listing the specific oversized functions with file path, function name, line number, and LOC count. In JSON output, these appear as a `large_functions` array on the health report.

<Accordion title="Research: SIG quality model">
  Risk profiles follow the [SIG/TÜViT quality model](https://www.softwareimprovementgroup.com/wp-content/uploads/SIG-TUViT-Evaluation-Criteria-Trusted-Product-Maintainability-Guidance-for-producers.pdf), which benchmarks codebases by the distribution of code across risk categories rather than by averages. A codebase where 95% of functions are low-risk is healthier than one with the same average complexity but a fat tail of very-high-risk functions.
</Accordion>

## Untested complexity risk

Complex code without test coverage is the most likely source of production bugs. Fallow quantifies this risk per function using a score based on the CRAP metric (Change Risk Anti-Patterns), originally developed by Alberto Savoia and Bob Evans at Agitar Labs in 2007.

The score appears in [`fallow health --file-scores`](/cli/health) output and JSON.

### How the score is computed

Fallow uses the canonical CRAP formula:

```
score = CC^2 × (1 - cov/100)^3 + CC
```

Where `cov` is per-function coverage (0–100). Fallow obtains that value in one of three ways:

| Model (`coverage_model`) | When                                                                                                             | How `cov` is determined                                                                                                               |
| :----------------------- | :--------------------------------------------------------------------------------------------------------------- | :------------------------------------------------------------------------------------------------------------------------------------ |
| `static_estimated`       | Default                                                                                                          | Graph-based per-function estimate: directly test-referenced = 85%, transitively test-reachable = 40%, not test-reachable = 0%.        |
| `istanbul`               | `--coverage <coverage-final.json>` (auto-detected from `coverage/` and `.nyc_output/`, or via `FALLOW_COVERAGE`) | Real per-function statement coverage produced by Jest, Vitest, c8, or nyc.                                                            |
| `static_binary`          | Legacy, snapshot/JSON compatibility only                                                                         | Whole-file binary: test-reachable file = 100% (score = CC), untested file = 0% (score = CC^2 + CC). Superseded by `static_estimated`. |

For untested code the quadratic term makes complexity hurt fast. Under the default `static_estimated` model, a not-test-reachable function with cyclomatic complexity 5 scores 5^2 × 1 + 5 = 30, which is exactly the default flag threshold.

For execution evidence from real traffic (hot/cold paths, runtime-weighted dead-code verdicts), see [Runtime coverage](/analysis/runtime-coverage). Runtime coverage is a paid Fallow Runtime feature and accepts V8 or Istanbul runtime data via `--runtime-coverage`; it merges on top of whichever static `coverage_model` is active.

**Default threshold:** 30 (functions scoring at or above this value are flagged)

### Interpreting the numbers

| Score   | Interpretation                             | Action                                                              |
| :------ | :----------------------------------------- | :------------------------------------------------------------------ |
| \< 30   | Acceptable risk                            | No action needed                                                    |
| 30-99   | Moderate risk, complex untested code       | Add test coverage or reduce complexity                              |
| 100-999 | High risk                                  | Prioritize: add tests, then simplify                                |
| > 999   | Extreme risk (shown as ">999" in terminal) | Refactor urgently, this code is both complex and invisible to tests |

In human output, scores above 999 are capped to `>999` for readability. JSON output always contains the exact value.

### File-level aggregation

Each file reports two aggregated values:

| Field                  | Meaning                                |
| :--------------------- | :------------------------------------- |
| `crap_max`             | Highest per-function score in the file |
| `crap_above_threshold` | Count of functions scoring >= 30       |

The active model is reported as `coverage_model` on the health summary (`static_estimated`, `istanbul`, or the legacy `static_binary`).

### Refactoring targets

The `AddTestCoverage` refactoring target fires when both conditions are true:

* `crap_above_threshold` >= 2 (multiple risky functions, not just one outlier)
* `complexity_density` > 0.3 (the file is genuinely complex, not just large)

This combination identifies files where adding test coverage will reduce the most risk per effort spent.

### Suppression

To exclude a file from untested complexity risk reporting:

```typescript theme={null}
// fallow-ignore-file coverage-gaps
```

This suppresses all coverage-related findings for the file, including the CRAP-based scores.

To suppress individual functions from complexity findings (cyclomatic and cognitive thresholds):

```typescript theme={null}
// fallow-ignore-next-line complexity
function* parseCsv(text) { /* ... */ }
```

See [Inline suppression](/configuration/suppression) for all suppressible issue types.

<Accordion title="Static estimation vs. real coverage">
  The CRAP formula needs a per-function coverage percentage:

  ```
  CRAP(m) = CC^2 × (1 - cov/100)^3 + CC
  ```

  A fully tested function (cov = 100) always scores exactly CC, regardless of complexity. A completely untested function (cov = 0) scores CC^2 + CC.

  Without coverage input, the default `static_estimated` model derives `cov` from the dependency graph: 85% for directly test-referenced exports, 40% for transitively test-reachable code, 0% for code with no path from any test file. This is an approximation of test coverage:

  * **False negatives**: A file imported by tests but with functions that are never actually exercised will score lower (look healthier) than it should.
  * **False positives**: A file referenced only from non-test code can score worse than it deserves if it is in fact unit-tested through indirect paths.

  For exact per-function scores, pass `--coverage <coverage-final.json>` (Jest, Vitest, c8, or nyc). Fallow auto-detects `coverage/coverage-final.json` and `.nyc_output/coverage-final.json`, or set `FALLOW_COVERAGE` to point at a custom path. The `coverage_model` field flips from `static_estimated` to `istanbul` when real coverage is in use.

  The legacy `static_binary` model (whole-file 0%/100% based on test reachability) is superseded by `static_estimated` and is retained only for snapshot/JSON backwards compatibility.
</Accordion>

<Accordion title="Research">
  The CRAP metric was introduced by Alberto Savoia and Bob Evans at Agitar Labs. Original presentation: ["CRAP4J" (2007)](http://www.artima.com/weblogs/viewpost.jsp?thread=215899). The core insight is that complexity alone is not dangerous if the code is well-tested, and simple code is low-risk even without tests. Risk concentrates where complexity and missing coverage overlap.
</Accordion>

## Hotspot metrics

Hotspots are files that are both **complex** and **frequently changing**. Bugs concentrate at this intersection, and refactoring here yields the highest return. Available via [`fallow health --hotspots`](/cli/health).

<Info>
  **The core insight:** Complex code that never changes is stable. Simple code that changes often is manageable. Complex code that changes often is where defects concentrate.
</Info>

### Hotspot score

```
normalized_churn = weighted_commits / max_weighted_commits   (0..1)
normalized_complexity = complexity_density / max_density      (0..1)
score = normalized_churn × normalized_complexity × 100       (0..100)
```

The multiplicative relationship means a file needs **both** high churn AND high complexity to score highly. Either dimension alone produces a low score.

| Score  | Interpretation   | Action                                |
| :----- | :--------------- | :------------------------------------ |
| 70–100 | Critical hotspot | Prioritize refactoring in next sprint |
| 40–70  | Moderate hotspot | Schedule review, reduce complexity    |
| 0–40   | Low risk         | Monitor                               |

```text fallow health --hotspots --top 3 theme={null}
● Hotspots (3 files, since 6 months)

   72.1 ▲  src/types.ts
           47 commits   2739 churn  0.20 density  42 fan-in  ▲ accelerating

   38.4 ─  src/helpers/util.ts
           23 commits    610 churn  0.22 density  35 fan-in  ─ stable

   12.7 ▼  src/locales/en.ts
            8 commits    177 churn  0.35 density   2 fan-in  ▼ cooling

  Files with high churn and high complexity — https://docs.fallow.tools/explanations/health#hotspot-metrics
```

### Weighted commits

Recency-weighted commit count using exponential decay with a 90-day half-life:

* A commit **yesterday** contributes \~1.0
* A commit **90 days ago** contributes \~0.5
* A commit **180 days ago** contributes \~0.25

<Accordion title="Research">
  Recency-weighted change metrics were introduced by [Graves et al. (2000)](https://doi.org/10.1109/32.859533). The empirical link between churn and defect density was established by [Nagappan and Ball (2005)](https://doi.org/10.1145/1062455.1062514) at Microsoft.
</Accordion>

### Churn trend

Compares commit frequency in the first half vs second half of the analysis window:

| Trend          | Meaning                    | Action                                   |
| :------------- | :------------------------- | :--------------------------------------- |
| `accelerating` | Recent > 1.5× older half   | High priority if complexity is also high |
| `stable`       | Balanced frequency         | Normal, monitor score                    |
| `cooling`      | Recent \< 0.67× older half | Lower priority, stabilizing              |

<Accordion title="Research: churn × complexity methodology">
  Pioneered by Michael Feathers and systematized by Adam Tornhill in [*Your Code as a Crime Scene*](https://pragprog.com/titles/atcrime/your-code-as-a-crime-scene/) (2015) and [*Software Design X-Rays*](https://pragprog.com/titles/atevol/software-design-x-rays/) (2018).
</Accordion>

## Refactoring targets

Refactoring targets are ranked, actionable recommendations produced by [`fallow health --targets`](/cli/health). They rank files by refactoring priority using the metrics above.

<Tip>
  **Quick start**: Run `fallow health --targets` for a prioritized refactoring list. Filter by `effort: low` for the fastest improvements. Save a baseline with `--save-baseline` and pass `--baseline` in CI to block new regressions. Update the baseline as you fix targets.
</Tip>

<Info>
  **Targets are synthesis, not data.** The other sections show measurements. Targets tell you what to do about them.
</Info>

### Priority score

```
priority = min(complexity_density, 1) × 30
         + hotspot_boost × 25            (hotspot_score / 100 if in hotspots, else 0)
         + dead_code_ratio × 20
         + min(fan_in / 20, 1) × 15
         + min(fan_out / 30, 1) × 10
```

All inputs are normalized to \[0, 1] so each weight is a true percentage share. The formula intentionally avoids the Maintainability Index to prevent double-counting (MI already incorporates density and dead code).

| Score  | Interpretation | Action                             |
| :----- | :------------- | :--------------------------------- |
| 70–100 | High priority  | Address in the current sprint      |
| 40–70  | Moderate       | Schedule for upcoming work         |
| 0–40   | Lower priority | Monitor, address opportunistically |

### Effort estimation

Every target includes an effort estimate (`low`, `medium`, or `high`) based on three file-level signals:

| Effort     | Criteria                                                                                   |
| :--------- | :----------------------------------------------------------------------------------------- |
| **low**    | \< 100 lines AND \u{2264} 3 functions AND \< 5 importers                                   |
| **high**   | \u{2265} 500 lines, OR \u{2265} 20 importers, OR (\u{2265} 15 functions AND density > 0.5) |
| **medium** | Everything else                                                                            |

Effort uses OR-semantics for "high" (any single criterion triggers it) and AND-semantics for "low" (all three must be true). A 50-line file with 25 importers is still "high" because changes ripple widely.

In human output, effort appears after the category label separated by `\u{00b7}`:

```
dead code \u{00b7} low    Remove 12 unused exports...
complexity \u{00b7} high  Extract 3 functions with cognitive complexity > 30...
```

### Evidence

For three recommendation categories, fallow provides structured evidence linking the target back to specific analysis data. This lets agents and scripts act on recommendations without a second tool call.

| Category                    | Evidence fields                                        | Example                                                   |
| :-------------------------- | :----------------------------------------------------- | :-------------------------------------------------------- |
| `remove_dead_code`          | `unused_exports`: list of export names                 | `["assertEqual", "assertIs", "assertNever"]`              |
| `extract_complex_functions` | `complex_functions`: list of `{name, line, cognitive}` | `[{"name": "processData", "line": 45, "cognitive": 113}]` |
| `break_circular_dependency` | `cycle_path`: list of file paths forming the cycle     | `["src/a.ts", "src/b.ts", "src/c.ts"]`                    |
| All other categories        | `evidence` omitted from JSON                           | \u{2014}                                                  |

Evidence is designed for machine consumption. In human output, the recommendation text already incorporates the key evidence (e.g., "Remove 12 unused exports" or "Extract processData (cognitive: 113)").

### Baselines

Use `--save-baseline` and `--baseline` with the health command to track refactoring progress over time. When a baseline is active, only *new* targets (not present in the baseline) are reported.

```bash theme={null}
# Save current targets as the baseline
fallow health --targets --save-baseline fallow-baselines/health.json

# On subsequent runs, only show new targets
fallow health --targets --baseline fallow-baselines/health.json
```

Baseline keys use the format `relative_path:category_label` (e.g., `src/utils.ts:dead code`). A target is considered "known" if the same file + category combination exists in the baseline, regardless of whether its priority score or effort level has changed.

This is useful in CI to prevent new tech debt from landing while allowing existing targets to be addressed incrementally.

### Recommendation categories

Each target is assigned a category based on which rule matches first (evaluated in priority order):

| Category                  | Display label      | Trigger                                                    | What it means                                                   |
| :------------------------ | :----------------- | :--------------------------------------------------------- | :-------------------------------------------------------------- |
| Urgent churn + complexity | `churn+complexity` | Hotspot score ≥ 50, accelerating trend, density > 0.5      | Actively-changing file with growing complexity, highest urgency |
| Break circular dependency | `circular dep`     | File in an import cycle with fan-in ≥ 5                    | Changes cascade through the cycle to many dependents            |
| Split high-impact file    | `high impact`      | Fan-in ≥ 20 (or ≥ 10 with ≥ 5 functions) and density > 0.3 | High blast radius amplifies every bug introduced                |
| Remove dead code          | `dead code`        | Dead code ratio ≥ 0.5 with ≥ 3 value exports               | Majority of exports are unused; reduce surface area             |
| Extract complex functions | `complexity`       | Top cognitive complexity ≥ 30                              | Contains functions that are very hard to understand             |
| Reduce coupling           | `coupling`         | Fan-out ≥ 15 and MI \< 60 (excluding entry points)         | Too many imports reduce testability                             |

A file can trigger multiple rules, but only the highest-priority match determines the category. All contributing factors are still collected and shown in JSON output.

```text fallow health --targets --top 5 theme={null}
● Refactoring targets (5)

  92.3    src/core/processor.ts
          churn+complexity · high  Actively-changing file with growing complexity — stabilize before adding features

  71.2    src/utils/helpers.ts
          high impact · high  Split high-impact file — 15 dependents amplify every change

  55.8    src/legacy/old-api.ts
          dead code · low  Remove 8 unused exports to reduce surface area (73% dead)

  52.0    src/graph/resolver.ts
          circular dep · medium  Break import cycle — 8 files depend on this, changes cascade through the cycle

  41.8    src/components/Dashboard.tsx
          complexity · high  Extract renderMetrics (cognitive: 45) and renderFilters (cognitive: 32) into smaller functions

  Prioritized refactoring recommendations based on complexity, churn, and coupling signals — https://docs.fallow.tools/explanations/health#refactoring-targets
```

### JSON structure

With `--format json`, each target includes structured data for programmatic consumption:

```json theme={null}
{
  "targets": [
    {
      "path": "src/core/processor.ts",
      "priority": 92.3,
      "recommendation": "Actively-changing file with growing complexity — stabilize before adding features",
      "category": "urgent_churn_complexity",
      "effort": "high",
      "factors": [
        {
          "metric": "complexity_density",
          "value": 0.67,
          "threshold": 0.3,
          "detail": "density 0.67 exceeds 0.3"
        },
        {
          "metric": "fan_in",
          "value": 12,
          "threshold": 10,
          "detail": "12 files depend on this"
        }
      ]
    },
    {
      "path": "src/helpers/util.ts",
      "priority": 38.8,
      "recommendation": "Remove 12 unused exports to reduce surface area (86% dead)",
      "category": "remove_dead_code",
      "effort": "low",
      "factors": [
        {
          "metric": "dead_code_ratio",
          "value": 0.86,
          "threshold": 0.5,
          "detail": "12 unused of 14 value exports (86%)"
        }
      ],
      "evidence": {
        "unused_exports": ["assertEqual", "assertIs", "assertNever", "oldConfig", "legacyHelper"]
      }
    }
  ]
}
```

Each factor includes raw `value` and `threshold` for programmatic use. The `evidence` field provides actionable detail for categories that support it. AI agents can use `evidence.unused_exports` to generate removal PRs without a second tool call. Categories without evidence omit the field entirely.

## Health score

The health score is a single 0–100 number summarizing overall codebase quality. Available via [`fallow health --score`](/cli/health).

```
score = 100 - sum(penalties)
```

Clamped to \[0, 100]. Each penalty targets a different quality dimension:

| Penalty         | Formula                                                                                      | Max |
| :-------------- | :------------------------------------------------------------------------------------------- | --: |
| Dead files      | `min(dead_file_pct × 0.2, 15)`                                                               |  15 |
| Dead exports    | `min(dead_export_pct × 0.2, 15)`                                                             |  15 |
| Complexity      | `min(critical_complexity_pct × 4, 20)`                                                       |  20 |
| P90 complexity  | `0` for formula v2 runs; older snapshots fall back to `min(max(0, p90_cyclomatic − 10), 10)` |  10 |
| Maintainability | `min(maintainability_low_pct × 1.5, 15)`                                                     |  15 |
| Hotspots        | `min(hotspot_top_pct_count / ceil(total_files × 0.01) × 10, 10)`                             |  10 |
| Unused deps     | `min(unused_deps_per_k_files × 0.5, 25)`                                                     |  25 |
| Circular deps   | `min(circular_deps_per_k_files × 0.5, 25)`                                                   |  25 |
| Unit size       | `min(functions_over_60_loc_per_k × 0.5, 10)`                                                 |  10 |
| Coupling        | `min(coupling_high_pct × 0.5, 5)`                                                            |   5 |
| Duplication     | `min(max(0, duplication_pct − 5) × 1.0, 10)`                                                 |  10 |

`health_score.formula_version` identifies the formula used. Version 2 uses scale-invariant density and tail metrics so large monorepos are not diluted by many healthy files. Older snapshots that lack the v2 fields fall back to the previous average, p90, and raw-count aggregators. The **unit size** penalty tracks functions above 60 LOC per 1,000 functions. The **coupling** penalty uses the percentage of high fan-in files. The **duplication** penalty kicks in when more than 5% of lines are duplicated; `--score` automatically runs duplication analysis to populate this metric.

**Letter grades:** A (score >= 85), B (70–84), C (55–69), D (40–54), F (below 40).

Penalty fields are `null` in JSON when the corresponding analysis didn't run. `--score` computes the score and automatically runs duplication analysis; add `--hotspots` (or combine `--score --targets`) when the score should include the churn-backed hotspot penalty.

## Combining health with dead code analysis

Health metrics are most powerful when combined with fallow's other analyses. Running `fallow` (all analyses) or `fallow health` alongside `fallow dead-code` reveals patterns that neither analysis shows alone:

* **Dead code in complex files**: A file with high cyclomatic complexity and 70% unused exports is a strong candidate for deletion rather than refactoring
* **Hotspots with circular dependencies**: A frequently-changed file that sits in an import cycle amplifies every change through the cycle to its dependents
* **Duplicated complex functions**: Functions flagged as complex that also appear as clone instances in `fallow dupes` are refactoring candidates where extraction yields the most benefit

Use `fallow health --targets` for prioritized recommendations that synthesize all these signals into a ranked action list.

## Limitations

<Warning>
  Metrics are signals, not verdicts. Be aware of these edge cases.
</Warning>

* **Generated files** (GraphQL codegen, Prisma client, OpenAPI types) may have high complexity density but shouldn't be refactored. Use `health.ignore` patterns to exclude them.
* **Test files** with many `it()`/`test()` blocks can have high cyclomatic complexity. This is usually fine. Test suites *should* cover many paths.
* **Config files** (Webpack, ESLint) may appear as hotspots due to frequent changes during setup. Their churn typically cools after initial configuration.
* **MI scores are not portable.** Fallow's formula differs from Visual Studio and SonarQube, so scores are not directly comparable across tools.
* **Hotspot scores are project-relative.** A score of 80 means "worst in your project," not "objectively bad." A well-maintained project may have a top score of 30.
* **Fan-in/out counts modules, not imports.** `import { a, b, c } from './utils'` counts as 1 edge, not 3.

## JSON `_meta` object

When `--explain` is passed (or via MCP), each command's JSON output includes a `_meta` object:

```json theme={null}
{
  "schema_version": 3,
  "_meta": {
    "docs": "https://docs.fallow.tools/explanations/health",
    "metrics": {
      "maintainability_index": {
        "name": "Maintainability Index",
        "description": "Composite score: 100 - (complexity_density × 30) - ...",
        "range": "[0, 100]",
        "interpretation": "higher is better; <40 poor, 40–70 moderate, >70 good"
      }
    }
  }
}
```

AI agents and CI systems can use this to interpret metric values without consulting external documentation.
