Architecture
SwarmVault is built around a simple pipeline: ingest → shape → analyze → compile → query.
Data Flow
Raw Sources ──> Schema ──> Analysis ──> Graph ──> Wiki ──> Search
│ │ │ │ │ │
immutable per-vault concepts nodes + markdown SQLite
files + naming & entities edges + pages + FTS
manifests grounding claims provenance outputs index- Raw sources are ingested and stored immutably with content hashes
- Schema guidance comes from
swarmvault.schema.md, which defines vault-specific rules - Analysis extracts concepts, entities, claims, and questions from each source
- Compilation merges analyses into a unified knowledge graph
- Wiki generation produces Markdown pages from the graph
- Search indexing enables full-text queries over the wiki
Dual Outputs
SwarmVault produces two canonical artifacts:
- Wiki (
wiki/) — Human-readable Markdown pages organized by page kind (index, sources, concepts, entities, outputs) - Graph (
state/graph.json) — Machine-readable JSON with nodes, edges, and full provenance metadata
Key Design Principles
- Immutable inputs — raw sources are never modified
- Deterministic compilation — same inputs produce same outputs
- Schema-guided behavior — each vault can impose its own structure without code changes
- Provenance tracking — every claim traces back to its source
- Anti-drift — linting detects when knowledge becomes stale
- Provider agnostic — swap LLMs without changing the pipeline