Configuration
SwarmVault uses two root-level configuration surfaces:
swarmvault.config.jsonfor runtime and provider configurationswarmvault.schema.mdfor vault-specific compile and query guidance
Full Schema
{
"workspace": {
"rawDir": "raw",
"wikiDir": "wiki",
"stateDir": "state",
"agentDir": "agent",
"inboxDir": "inbox"
},
"providers": {
"local": {
"type": "heuristic",
"model": "heuristic-v1",
"capabilities": ["chat", "structured", "vision", "local"]
}
},
"tasks": {
"compileProvider": "local",
"queryProvider": "local",
"lintProvider": "local",
"visionProvider": "local",
"imageProvider": "local"
},
"profile": {
"presets": [],
"dashboardPack": "default",
"guidedSessionMode": "insights_only",
"guidedIngestDefault": false,
"deepLintDefault": false,
"dataviewBlocks": false
},
"viewer": {
"port": 4123
},
"projects": {
"engine": {
"roots": ["packages/engine"],
"schemaPath": "schemas/engine.schema.md"
}
},
"repoAnalysis": {
"extractClasses": ["first_party"],
"classifyGlobs": {
"third_party": ["third_party/**"],
"resource": ["App.xcassets/**"],
"generated": ["dist/**"]
}
},
"graphSinks": {
"neo4j": {
"uri": "bolt://127.0.0.1:7687",
"username": "neo4j",
"passwordEnv": "NEO4J_PASSWORD",
"database": "neo4j",
"includeClasses": ["first_party"]
}
},
"retrieval": {
"backend": "sqlite",
"shardSize": 25000,
"hybrid": true,
"rerank": false
},
"autoCommit": false,
"agents": [],
"schedules": {
"nightly-compile": {
"enabled": true,
"when": { "cron": "0 3 * * *" },
"task": { "type": "compile", "approve": true }
}
},
"orchestration": {
"maxParallelRoles": 2,
"compilePostPass": false,
"roles": {
"research": {
"executor": { "type": "provider", "provider": "local" }
}
}
},
"webSearch": {
"providers": {
"evidence": {
"type": "http-json",
"endpoint": "https://search.example/api/search",
"method": "GET",
"apiKeyEnv": "SEARCH_API_KEY",
"apiKeyHeader": "Authorization",
"apiKeyPrefix": "Bearer ",
"queryParam": "q",
"limitParam": "limit",
"resultsPath": "results",
"titleField": "title",
"urlField": "url",
"snippetField": "snippet"
}
},
"tasks": {
"deepLintProvider": "evidence"
}
}
}embeddingProvider, audioProvider, and graph.communityResolution are intentionally omitted from the baseline local example above. The built-in heuristic provider is great for local/offline compile and query defaults, but it does not generate embeddings or audio transcripts. If you want semantic graph query without API keys, add an embedding-capable local backend such as Ollama and point tasks.embeddingProvider at that provider. If you want audio-file transcription, point tasks.audioProvider at a provider that exposes audio capability.
Sections
`workspace`
Controls the directory layout. All paths are relative to the workspace root.
| Field | Default | Description |
|---|---|---|
rawDir | "raw" | Root directory for canonical source and asset storage |
wikiDir | "wiki" | Compiled markdown output |
stateDir | "state" | Manifests, extracts, analyses, graph, retrieval, and jobs |
agentDir | "agent" | Agent-specific files |
inboxDir | "inbox" | Capture staging area for inbox import and watch mode |
Set SWARMVAULT_OUT=<dir> when generated workspace artifacts should be isolated from the source tree. Config and schema files stay in the project root, while relative rawDir, wikiDir, stateDir, agentDir, and inboxDir values resolve under the output root. Absolute workspace paths remain absolute.
`providers`
Named provider definitions. SwarmVault supports built-in providers, named OpenAI-compatible presets such as OpenRouter and Groq, plus generic OpenAI-compatible and custom module adapters. See Provider Config.
`tasks`
Maps each engine task to a named provider from the providers object.
| Field | Description |
|---|---|
compileProvider | Provider used during compile |
queryProvider | Provider used for natural-language answers |
lintProvider | Provider used for lint and health checks |
visionProvider | Provider used for image-aware extraction |
imageProvider | Optional provider used for native image output generation |
embeddingProvider | Optional provider used for semantic graph query and embedding-backed similarity enrichment |
audioProvider | Optional provider used for audio-file transcription during ingest |
`retrieval`
Optional local retrieval tuning layered on top of the compiled SQLite index.
| Field | Default | Description |
|---|---|---|
backend | "sqlite" | Stable local backend used for the retrieval index |
shardSize | 25000 | Target page-row count per local shard; currently used for manifest planning |
hybrid | true | When an embedding-capable provider is configured, fuse semantic page hits into the same result set as full-text search |
rerank | false | Ask the configured queryProvider to rerank merged search hits before answer generation |
Hybrid search only kicks in when SwarmVault can resolve an embedding-capable provider, either through tasks.embeddingProvider or by falling back to a queryProvider that also supports embeddings, and the vault already has state/graph.json. Legacy search.hybrid and search.rerank keys migrate to retrieval.hybrid and retrieval.rerank with swarmvault migrate --target 3.0.0 --apply.
`viewer`
| Field | Default | Description |
|---|---|---|
port | 4123 | Port used by swarmvault graph serve |
`graph`
Optional graph-clustering tuning.
| Field | Description |
|---|---|
communityResolution | Optional Louvain resolution override for graph reports, viewer communities, and Obsidian community export output |
`profile`
Optional deterministic vault-behavior presets and defaults layered alongside swarmvault.schema.md.
| Field | Default | Description |
|---|---|---|
presets | [] | Built-in preset list such as reader, timeline, thesis, or diligence |
dashboardPack | "default" | Which dashboard emphasis to generate by default |
guidedSessionMode | "insights_only" | Whether guided approval bundles target canonical pages or stay in wiki/insights/ |
guidedIngestDefault | false | Make ingest, source add, and source reload use guided mode by default |
deepLintDefault | false | Make swarmvault lint include the advisory deep-lint pass by default |
dataviewBlocks | false | Append Dataview blocks to dashboards and related artifacts |
The personal-research preset enables both guidedIngestDefault and deepLintDefault, so source integration and linting start in their stronger modes until you override them with --no-guide or --no-deep.
`projects`
Optional project-aware source grouping and schema layering.
| Field | Description |
|---|---|
projects.<id>.roots | Workspace-relative directory prefixes used to assign sources to a project |
projects.<id>.schemaPath | Optional project-specific schema appended after the root schema |
`autoCommit`
Optional engine-level default for programmatic auto-commit helpers.
- When
true, integrations that call the engine auto-commit helper without forcing it can create a git commit for changedwiki/andstate/content automatically. - The CLI-level
--commitflags oningest,compile, andquerybypass this default and force the same git-aware behavior for that one command. - Outside a git worktree, auto-commit stays a no-op.
`agents`
Array of agent types to install rules for when an initialization command is run with --install-agent-rules. The default is empty, so init, quickstart, scan, and clone do not write project-local rule files unless you opt in. Supported values are "codex", "claude", "cursor", "goose", "pi", "gemini", "opencode", "aider", "copilot", "trae", "claw", and "droid".
`repoAnalysis`
Optional repo-wide source-class defaults and classification overrides for directory ingest plus repo watch.
| Field | Description |
|---|---|
extractClasses | Which source classes should be ingested by default during repo/directory ingest. Defaults to ["first_party"]. |
classifyGlobs | Optional extra glob patterns keyed by first_party, third_party, resource, or generated |
`graphSinks`
Optional external graph sink configuration.
`graphSinks.neo4j`
| Field | Description |
|---|---|
uri | Neo4j Bolt or Aura URI |
username | Neo4j username |
passwordEnv | Environment variable containing the Neo4j password |
database | Optional Neo4j database name. Defaults to neo4j |
vaultId | Optional stable namespace for shared Neo4j databases |
includeClasses | Which source classes to push by default. Defaults to ["first_party"] |
batchSize | Optional write batch size for graph push neo4j |
`schedules`
Optional recurring jobs for compile, lint, query, and explore. See Schedules.
`orchestration`
Optional role mapping for research, audit, context, and safety across providers or external commands. See Orchestration.
`webSearch`
Optional web-search configuration used by swarmvault lint --deep --web.
It is separate from the normal LLM provider registry.
| Field | Description |
|---|---|
providers | Named web-search provider definitions |
tasks.deepLintProvider | Which named provider to use for deep-lint evidence gathering |
Schema File
swarmvault.schema.md is the vault-specific markdown instruction layer. It is how you teach one vault to behave differently from another without changing code or adding a custom parser.
Use it to define:
- naming rules
- concept and entity categories
- relationship expectations
- grounding and citation rules
- exclusions
See Schema for examples and behavior details.
See Projects for project-aware schema layering, project_ids, and wiki/projects/ rollups.