Compilation Pipeline

The swarmvault compile command runs a multi-step pipeline that transforms raw sources into structured knowledge.

What Compile Produces

ArtifactLocationDescription
Wiki indexwiki/index.mdHome page with vault overview
Source pageswiki/sources/One page per ingested source
Code module pageswiki/code/One page per code source with imports, exports, symbols
Concept pageswiki/concepts/Active concept pages (promoted from candidates)
Entity pageswiki/entities/Active entity pages (promoted from candidates)
Candidate pageswiki/candidates/Staged concepts and entities awaiting promotion
Project rollupswiki/projects/Cross-source summaries when projects are configured
Knowledge graphstate/graph.jsonNodes, edges, communities, provenance metadata
Graph reportwiki/graph/report.mdGod nodes, surprising connections, community summaries, and graph-health signals
Share kitwiki/graph/share-card.md, wiki/graph/share-card.svg, wiki/graph/share-kit/Compact post-ready summary, visual SVG card, and portable HTML-preview bundle rendered by swarmvault graph share
Retrieval indexstate/retrieval/SQLite FTS shard and manifest over all wiki pages
Code indexstate/code-index.jsonRepo-aware module aliases for import resolution
Benchmarkstate/benchmark.jsonContext-reduction metrics
Contradictionswiki/graph/report.mdConflicting claims across sources with confidence delta

Steps

1. Load Sources

Reads source manifests from state/manifests/, prior analyses from state/analyses/, the repo-aware code alias registry in state/code-index.json when present, and the root plus project-specific schemas that apply to this vault.

2. Analyze Sources

For each new or changed non-code source, the configured compileProvider extracts the following, plus up to 5 broad domain tags that categorize the source:

  • Concepts (max 12) — Key ideas and topics with descriptions
  • Entities (max 12) — Named things (people, tools, organizations) with descriptions
  • Claims (max 8) — Factual assertions with confidence scores and polarity (positive/negative/neutral)
  • Questions (max 6) — Questions the source raises or answers

For JavaScript, JSX, TypeScript, TSX, Bash, Python, Go, Rust, Java, Kotlin, Scala, Dart, Lua, Zig, C#, C, C++, PHP, Ruby, PowerShell, Elixir, OCaml, Objective-C, ReScript, Solidity, HTML, CSS, Vue, Svelte, Julia, Verilog/SystemVerilog, R, and SQL sources, SwarmVault also runs a local code-analysis pass where parser support exists. R currently emits an explicit parser diagnostic rather than heuristic code extraction until a safe packaged grammar exists. The code-analysis pass extracts:

  • module ids and language
  • repo-relative paths plus module/package/namespace metadata when available
  • imports, dynamic import() edges, and re-exports
  • external dependencies
  • classes, functions, interfaces, enums, variables, and exports
  • inheritance, implementation, and same-module call edges
  • Julia modules/types/functions and Verilog/SystemVerilog modules/interfaces/packages/instantiations
  • SQL table/view definitions plus read/write/join/reference edges
  • parser diagnostics when the file cannot be fully understood

JavaScript, JSX, TypeScript, and TSX use the TypeScript compiler API. SQL uses a SQL AST parser. The other shipped languages use parser-backed local analyzers that feed the same module-page and graph pipeline.

3. Build Knowledge Graph

Merges all analyses into a unified graph:

  • Nodes — Sources, modules, symbols, concepts, and entities with freshness and project metadata
  • Edges — Relationships such as extracted/inferred claims, imports, exports, defines, calls, extends, implements, reads, writes, joins, and references, including resolved local code links from the repo-aware alias index
  • Derived metrics — Community ids, degree, bridge scores, and "god node" hints for high-connectivity nodes

3a. Detect Contradictions

After building the graph, SwarmVault compares claims across sources for topic overlap with opposite polarity. Detected contradictions become contradicts edges in the graph with evidenceClass: "ambiguous" and appear in the graph report's Contradictions section.

4. Generate Wiki Pages

Creates Markdown pages from the graph:

  • wiki/index.md — Home page with overview
  • wiki/sources/ — One page per ingested source
  • wiki/code/ — One module page per ingested code source
  • wiki/candidates/ — Staged concept and entity pages on first sighting
  • wiki/concepts/ — Active concept pages
  • wiki/entities/ — Active entity pages
  • wiki/projects/ — Project rollups over canonical pages when projects are configured
  • wiki/outputs/ and wiki/insights/ stay part of the overall page registry

5. Build Search Index

Rebuilds the SQLite FTS index over all wiki pages for fast full-text search.

Reviewable Compile

When you run swarmvault compile --approve, SwarmVault stages changed pages and a graph preview into state/approvals/ instead of mutating active wiki files. The later review accept or review reject actions refresh the live wiki, graph, and search state without rerunning compile.

Incremental Compilation

Analyses include content signatures. If a source hasn't changed, its existing analysis is reused, saving LLM API calls. Incremental invalidation also tracks root schema hashes, effective project schema hashes, source-to-project assignment, saved-output artifacts, and repo-aware code alias state.

When compile is triggered through the repo-aware watch path and the change set is code-only, SwarmVault uses a narrower code-only refresh path so code pages and graph structure update without re-running non-code semantic analysis for unchanged sources.