Graph Audit

Sweeps the graph looking for drift and violations at scales Node Validate cannot reach. Node Validate checks one node against its Form Contract and reports findings local to that node; Graph Audit walks the corpus, aggregates findings across files, and reports patterns — vocabularies accreting without documentation, ghost links accumulating into planning debt, edges that never got their annotation, predicates that forbid themselves appearing at scale.

The skill's output is not a comprehensive list of every small violation. It is a compressed audit report grouped by category, naming the scale of each finding and the remediation path. A scion author reading the report decides which categories warrant a cleanup pass now, which deserve their own follow-up work, and which are acceptable drift for the current stage.

The audit is intentionally read-only. Graph Audit does not fix violations, does not delete stale nodes, and does not propose edits. Fixing is the work that follows the audit — usually direct editing, sometimes Node Validate per flagged node, sometimes Predicate Propose for a vocabulary gap the audit surfaced.

Steps

Step 1: Scope the audit

Ask the user the scope unless one is already named. Common scopes:

Report the scope back to the user before running the sweep. A broad scope costs more context; narrow scopes are often what the user actually wanted.

Step 2: Mechanical sweep

Run mechanical checks with rg via Bash. These are fast, deterministic, and catch clear violations.

Filename rules — em-dashes and en-dashes forbidden in filenames:

find nodes -name '*—*' -o -name '*–*' 2>/dev/null

Forbidden predicaterelates_to:: is prohibited per No Generic relates_to Predicate. Anchor the pattern to ^- so prose discussions (which frequently mention the forbidden predicate in backticks) are excluded:

rg -n -I -- '^- relates_to::' nodes/

Identity block presence — every .md node under nodes/ should have a conforms_to:: line. Use find -print0 | while IFS= read -r -d '' so filenames with spaces (Gloss and Predicate filenames use -- separators) survive the loop intact:

find nodes -type f -name '*.md' -print0 | while IFS= read -r -d '' file; do
  grep -q '^- conforms_to::' "$file" || echo "missing conforms_to: $file"
done

Report any hits as Violations.

Step 3: Vocabulary audit

List every predicate in use across the graph, count occurrences, and identify which have backing Predicate nodes:

rg -o -I -- '^- [a-z_]+::' nodes/ | sort | uniq -c | sort -rn

Cross-reference against the backing nodes:

ls nodes/Predicates/

Split the vocabulary into three tiers:

Report provisional predicates as candidates for /predicate-propose work — each is either drift to consolidate or vocabulary to codify.

Step 4: Ghost-link inventory

Extract every wikilink target and compare against the set of existing files. The difference is the ghost-link list.

rg -o -I -- '\[\[[^|\]]+\]\]' nodes/ | sort -u > /tmp/graph-audit-targets.txt
find nodes -name '*.md' -type f -exec basename {} .md \; | sort -u > /tmp/graph-audit-files.txt

For each target in the targets file that does not match a file stem (or the concept side of a ---suffixed file), it is a candidate ghost link.

Filter out template-token false positives before reporting. Contract bodies and skill bodies use [[X]], [[<Domain>]], [[Target]], [[Editor]], [[Principal]], [[Downstream Node]], [[X Form Contract]], [[<placeholder>]], and similar as syntactic placeholders in examples — these are documentation shapes, not intended wikilinks. A candidate whose target is a single uppercase letter, contains < or >, matches X Form Contract, or appears inside backtick-fenced content in its source is a false positive. Filter them out before producing the ghost-link list, or name them as "template-token false positives" in a separate bucket so the scion author knows to skip them.

Ghost links are not violations — they are planning signals per Markdown Node Contract's Named-edge syntax Requirement. Not every ghost is equal, though; four buckets sharpen the signal:

Deliberate ghosts — bare wikilinks in Predicate Crescent H3 headings (### Against [[predicate]]) and the paired contrasts_with::[[predicate]] edges in Relations, targeting predicates that deliberately don't have Predicate nodes. Categories include prohibited predicates (relates_to, is_a — forbidden by Decision, so no node will ever exist), base-contract predicates (authored_by, has_lifecycle, has_commitment and similar — introduced by Contracts rather than by Predicate nodes), and adjacent-graph vocabulary (derived_from, contradicts — predicates used in other graphs the Crescent is contrasting against). Deliberate ghosts are NOT drift; the contrasts_with:: edge annotations typically acknowledge them explicitly ("Ghost link; target is prohibited..."). Report them as informational — no action needed.

Drift ghosts — bare wikilinks to what looks like a node that should exist and doesn't. Usually arises from a file rename that didn't update incoming references, or from a reference to a planned node that was never created. These are actionable: fix the reference, create the target, or explicitly demote the reference to deliberate-ghost status with an annotation.

Vocabulary-value ghosts — identity-predicate values ([[Seed Stage]], [[Working Draft]], [[Provisional Commitment]], [[Empirical Observation]], etc.) that every node points to via has_lifecycle::, has_curation::, has_commitment::, or has_epistemic_status:: but which have no corresponding Gloss. These carry the highest-inbound-count ghosts in a new graph and represent the largest self-documentation gap. A healthy graph seeds Glosses for them early; an unhealthy graph accumulates identity predicates pointing to undefined values.

Planning-surface ghosts — single- or low-count bare wikilinks to genuinely-unfinished concepts ([[Convention Overhead vs Graph Quality]], a person's Gloss, a future Decision). These are the scion author's to-write list.

Report each bucket separately, with the bucket name in the report. Deliberate and template-token buckets are informational; vocabulary-value and drift buckets are actionable; planning-surface is curation. A ghost whose targets appears in MORE than one bucket (e.g., a predicate-name bare wikilink that's also a provisional predicate) counts as Drift until the author promotes it to a Predicate node, at which point it resolves.

Step 5: Un-annotated edge sweep

Every top-level bullet under ## Relations should be followed by an indented sub-bullet annotation per Annotate Edges With Why-They-Matter. An un-annotated edge is tag spaghetti.

The check pattern: within the ## Relations section of each node, for every top-level bullet matching ^- [a-z_]+::, the following non-blank line MUST begin with - (two-space indent then hyphen). Implementations vary — awk with getline, a short Python script over Path.read_text().split("\n"), or a two-line rg over the section's range are all reasonable. Choose what the agent can ship reliably.

Report un-annotated edges as Shortfall findings, grouped by file. If the count is large enough that the fix is a dedicated curation pass rather than quick edits, flag the scale in the summary rather than listing every instance.

Step 6: Reciprocal-edge sweep

When a node carries a forward edge like informs_downstream::[[X]] in its Relations, the target X typically carries a reciprocal edge back. Bidirectional edges are the convention across Decisions, References, Contracts, and Predicates; missing reciprocals are cascade failures — a forward edge was added without the back-edge being wired on the target.

The conventional pairs:

Forward edge Reciprocal on target
informs_downstream::[[X]] X has informed_by:: or grounded_in:: back
grounded_in::[[X]] X has informs_downstream:: back
informed_by::[[X]] X has informs_downstream:: back
extends_contract::[[X]] X has extended_by:: back
supersedes::[[X]] X has superseded_by:: back
contrasts_with::[[X]] X has contrasts_with:: back (symmetric)

The check pattern: for each forward edge in every node's ## Relations section, resolve the target to its file, and search that file for the corresponding reciprocal pointing back at the source node. Implementations vary; an rg sweep for forward edges piped into a per-target check is one pattern. Some forward edges are legitimately one-directional — an external reference marked with does not live in the graph, and a ghost link's target does not yet exist; skip externals and ghost-link targets.

Report missing reciprocals as Shortfall findings, grouped by the source node. Most missing reciprocals are quick Edit fixes — add the reciprocal on the target with an annotation explaining the relationship.

The sweep catches a specific failure pattern: when a new node is added and its forward edges are wired, it is easy to miss wiring the corresponding back-edges on older, established target nodes. The sweep makes the gap visible at graph scope rather than relying on per-node validation to notice.

Step 7: Orphan detection

An orphan node has no incoming edges from any other node. Some orphans are intentional (the landing page; founding documents that are only linked-to from outside the graph). Others are drift (a node that was written but never wired into the graph).

For each node, search for incoming references:

for file in $(find nodes -name '*.md' -type f); do
  stem=$(basename "$file" .md)
  concept=$(echo "$stem" | sed 's/ -- .*//')
  count=$(rg -c -I -- "\[\[$stem\]\]|\[\[$concept" nodes/ 2>/dev/null | wc -l)
  if [ "$count" -eq 0 ]; then
    echo "orphan: $file"
  fi
done

Report orphans with their form (from conforms_to::) and lifecycle stage. A Seed Stage orphan is usually a work-in-progress; an Evergreen orphan is a candidate for either promotion (adding incoming edges) or demotion (stepping back the lifecycle).

Step 8: Aggregate and report

Group findings by category, not by file. The report structure:

Report each category compressed. Do not dump every finding; sample representative cases per category and name the total count. A scion owner wants to know "there are 40 ghost links, the top five are…" — not 40 individual list items.

Step 9: Name the follow-ups

End the report by naming which follow-up skills or operations would address which categories:

The follow-up naming lets the scion author route the audit's findings without re-deriving what each category asks for.

Relations