Compilation Pipeline
The swarmvault compile command runs a multi-step pipeline that transforms raw sources into structured knowledge.
What Compile Produces
| Artifact | Location | Description |
|---|---|---|
| Wiki index | wiki/index.md | Home page with vault overview |
| Source pages | wiki/sources/ | One page per ingested source |
| Code module pages | wiki/code/ | One page per code source with imports, exports, symbols |
| Concept pages | wiki/concepts/ | Active concept pages (promoted from candidates) |
| Entity pages | wiki/entities/ | Active entity pages (promoted from candidates) |
| Candidate pages | wiki/candidates/ | Staged concepts and entities awaiting promotion |
| Project rollups | wiki/projects/ | Cross-source summaries when projects are configured |
| Knowledge graph | state/graph.json | Nodes, edges, communities, provenance metadata |
| Graph report | wiki/graph/report.md | God nodes, surprising connections, community summaries, and graph-health signals |
| Share kit | wiki/graph/share-card.md, wiki/graph/share-card.svg, wiki/graph/share-kit/ | Compact post-ready summary, visual SVG card, and portable HTML-preview bundle rendered by swarmvault graph share |
| Retrieval index | state/retrieval/ | SQLite FTS shard and manifest over all wiki pages |
| Code index | state/code-index.json | Repo-aware module aliases for import resolution |
| Benchmark | state/benchmark.json | Context-reduction metrics |
| Contradictions | wiki/graph/report.md | Conflicting claims across sources with confidence delta |
Steps
1. Load Sources
Reads source manifests from state/manifests/, prior analyses from state/analyses/, the repo-aware code alias registry in state/code-index.json when present, and the root plus project-specific schemas that apply to this vault.
2. Analyze Sources
For each new or changed non-code source, the configured compileProvider extracts the following, plus up to 5 broad domain tags that categorize the source:
- Concepts (max 12) — Key ideas and topics with descriptions
- Entities (max 12) — Named things (people, tools, organizations) with descriptions
- Claims (max 8) — Factual assertions with confidence scores and polarity (positive/negative/neutral)
- Questions (max 6) — Questions the source raises or answers
For JavaScript, JSX, TypeScript, TSX, Bash, Python, Go, Rust, Java, Kotlin, Scala, Dart, Lua, Zig, C#, C, C++, PHP, Ruby, PowerShell, Elixir, OCaml, Objective-C, ReScript, Solidity, HTML, CSS, Vue, Svelte, Julia, Verilog/SystemVerilog, R, and SQL sources, SwarmVault also runs a local code-analysis pass where parser support exists. R currently emits an explicit parser diagnostic rather than heuristic code extraction until a safe packaged grammar exists. The code-analysis pass extracts:
- module ids and language
- repo-relative paths plus module/package/namespace metadata when available
- imports, dynamic
import()edges, and re-exports - external dependencies
- classes, functions, interfaces, enums, variables, and exports
- inheritance, implementation, and same-module call edges
- Julia modules/types/functions and Verilog/SystemVerilog modules/interfaces/packages/instantiations
- SQL table/view definitions plus read/write/join/reference edges
- parser diagnostics when the file cannot be fully understood
JavaScript, JSX, TypeScript, and TSX use the TypeScript compiler API. SQL uses a SQL AST parser. The other shipped languages use parser-backed local analyzers that feed the same module-page and graph pipeline.
3. Build Knowledge Graph
Merges all analyses into a unified graph:
- Nodes — Sources, modules, symbols, concepts, and entities with freshness and project metadata
- Edges — Relationships such as extracted/inferred claims, imports, exports, defines, calls, extends, implements, reads, writes, joins, and references, including resolved local code links from the repo-aware alias index
- Derived metrics — Community ids, degree, bridge scores, and "god node" hints for high-connectivity nodes
3a. Detect Contradictions
After building the graph, SwarmVault compares claims across sources for topic overlap with opposite polarity. Detected contradictions become contradicts edges in the graph with evidenceClass: "ambiguous" and appear in the graph report's Contradictions section.
4. Generate Wiki Pages
Creates Markdown pages from the graph:
wiki/index.md— Home page with overviewwiki/sources/— One page per ingested sourcewiki/code/— One module page per ingested code sourcewiki/candidates/— Staged concept and entity pages on first sightingwiki/concepts/— Active concept pageswiki/entities/— Active entity pageswiki/projects/— Project rollups over canonical pages when projects are configuredwiki/outputs/andwiki/insights/stay part of the overall page registry
5. Build Search Index
Rebuilds the SQLite FTS index over all wiki pages for fast full-text search.
Reviewable Compile
When you run swarmvault compile --approve, SwarmVault stages changed pages and a graph preview into state/approvals/ instead of mutating active wiki files. The later review accept or review reject actions refresh the live wiki, graph, and search state without rerunning compile.
Incremental Compilation
Analyses include content signatures. If a source hasn't changed, its existing analysis is reused, saving LLM API calls. Incremental invalidation also tracks root schema hashes, effective project schema hashes, source-to-project assignment, saved-output artifacts, and repo-aware code alias state.
When compile is triggered through the repo-aware watch path and the change set is code-only, SwarmVault uses a narrower code-only refresh path so code pages and graph structure update without re-running non-code semantic analysis for unchanged sources.