Architecture

SwarmVault is built around a simple pipeline: ingest → shape → analyze → compile → query.

Data Flow

Raw Sources ──> Schema ──> Analysis ──> Graph ──> Wiki ──> Search
     │              │          │          │         │         │
  immutable     per-vault   concepts   nodes +   markdown   SQLite
  files +       naming &    entities   edges +   pages +    FTS
  manifests     grounding   claims     provenance outputs   index

Raw sources are ingested and stored immutably with content hashes
Schema guidance comes from swarmvault.schema.md, which defines vault-specific rules
Analysis extracts concepts, entities, claims, and questions from each source
Compilation merges analyses into a unified knowledge graph
Wiki generation produces Markdown pages from the graph
Search indexing enables full-text queries over the wiki

Dual Outputs

SwarmVault produces two canonical artifacts:

Wiki (wiki/) — Human-readable Markdown pages organized by page kind (index, sources, concepts, entities, outputs)
Graph (state/graph.json) — Machine-readable JSON with nodes, edges, and full provenance metadata

Key Design Principles

Immutable inputs — raw sources are never modified
Deterministic compilation — same inputs produce same outputs
Schema-guided behavior — each vault can impose its own structure without code changes
Provenance tracking — every claim traces back to its source
Anti-drift — linting detects when knowledge becomes stale
Provider agnostic — swap LLMs without changing the pipeline