Configuration

SwarmVault uses two root-level configuration surfaces:

swarmvault.config.json for runtime and provider configuration
swarmvault.schema.md for vault-specific compile and query guidance

Full Schema

{
  "workspace": {
    "rawDir": "raw",
    "wikiDir": "wiki",
    "stateDir": "state",
    "agentDir": "agent",
    "inboxDir": "inbox"
  },
  "providers": {
    "local": {
      "type": "heuristic",
      "model": "heuristic-v1",
      "capabilities": ["chat", "structured", "vision", "local"]
    }
  },
  "tasks": {
    "compileProvider": "local",
    "queryProvider": "local",
    "lintProvider": "local",
    "visionProvider": "local",
    "imageProvider": "local"
  },
  "profile": {
    "presets": [],
    "dashboardPack": "default",
    "guidedSessionMode": "insights_only",
    "guidedIngestDefault": false,
    "deepLintDefault": false,
    "dataviewBlocks": false
  },
  "viewer": {
    "port": 4123
  },
  "projects": {
    "engine": {
      "roots": ["packages/engine"],
      "schemaPath": "schemas/engine.schema.md"
    }
  },
  "repoAnalysis": {
    "extractClasses": ["first_party"],
    "classifyGlobs": {
      "third_party": ["third_party/**"],
      "resource": ["App.xcassets/**"],
      "generated": ["dist/**"]
    }
  },
  "graphSinks": {
    "neo4j": {
      "uri": "bolt://127.0.0.1:7687",
      "username": "neo4j",
      "passwordEnv": "NEO4J_PASSWORD",
      "database": "neo4j",
      "includeClasses": ["first_party"]
    }
  },
  "retrieval": {
    "backend": "sqlite",
    "shardSize": 25000,
    "hybrid": true,
    "rerank": false
  },
  "autoCommit": false,
  "agents": [],
  "schedules": {
    "nightly-compile": {
      "enabled": true,
      "when": { "cron": "0 3 * * *" },
      "task": { "type": "compile", "approve": true }
    }
  },
  "orchestration": {
    "maxParallelRoles": 2,
    "compilePostPass": false,
    "roles": {
      "research": {
        "executor": { "type": "provider", "provider": "local" }
      }
    }
  },
  "webSearch": {
    "providers": {
      "evidence": {
        "type": "http-json",
        "endpoint": "https://search.example/api/search",
        "method": "GET",
        "apiKeyEnv": "SEARCH_API_KEY",
        "apiKeyHeader": "Authorization",
        "apiKeyPrefix": "Bearer ",
        "queryParam": "q",
        "limitParam": "limit",
        "resultsPath": "results",
        "titleField": "title",
        "urlField": "url",
        "snippetField": "snippet"
      }
    },
    "tasks": {
      "deepLintProvider": "evidence"
    }
  }
}

embeddingProvider, audioProvider, and graph.communityResolution are intentionally omitted from the baseline local example above. The built-in heuristic provider is great for local/offline compile and query defaults, but it does not generate embeddings or audio transcripts. If you want semantic graph query without API keys, add an embedding-capable local backend such as Ollama and point tasks.embeddingProvider at that provider. If you want audio-file transcription, point tasks.audioProvider at a provider that exposes audio capability.

Sections

`workspace`

Controls the directory layout. All paths are relative to the workspace root.

Field	Default	Description
`rawDir`	`"raw"`	Root directory for canonical source and asset storage
`wikiDir`	`"wiki"`	Compiled markdown output
`stateDir`	`"state"`	Manifests, extracts, analyses, graph, retrieval, and jobs
`agentDir`	`"agent"`	Agent-specific files
`inboxDir`	`"inbox"`	Capture staging area for inbox import and watch mode

Set SWARMVAULT_OUT=<dir> when generated workspace artifacts should be isolated from the source tree. Config and schema files stay in the project root, while relative rawDir, wikiDir, stateDir, agentDir, and inboxDir values resolve under the output root. Absolute workspace paths remain absolute.

`providers`

Named provider definitions. SwarmVault supports built-in providers, named OpenAI-compatible presets such as OpenRouter and Groq, plus generic OpenAI-compatible and custom module adapters. See Provider Config.

`tasks`

Maps each engine task to a named provider from the providers object.

Field	Description
`compileProvider`	Provider used during compile
`queryProvider`	Provider used for natural-language answers
`lintProvider`	Provider used for lint and health checks
`visionProvider`	Provider used for image-aware extraction
`imageProvider`	Optional provider used for native `image` output generation
`embeddingProvider`	Optional provider used for semantic graph query and embedding-backed similarity enrichment
`audioProvider`	Optional provider used for audio-file transcription during ingest

`retrieval`

Optional local retrieval tuning layered on top of the compiled SQLite index.

Field	Default	Description
`backend`	`"sqlite"`	Stable local backend used for the retrieval index
`shardSize`	`25000`	Target page-row count per local shard; currently used for manifest planning
`hybrid`	`true`	When an embedding-capable provider is configured, fuse semantic page hits into the same result set as full-text search
`rerank`	`false`	Ask the configured `queryProvider` to rerank merged search hits before answer generation

Hybrid search only kicks in when SwarmVault can resolve an embedding-capable provider, either through tasks.embeddingProvider or by falling back to a queryProvider that also supports embeddings, and the vault already has state/graph.json. Legacy search.hybrid and search.rerank keys migrate to retrieval.hybrid and retrieval.rerank with swarmvault migrate --target 3.0.0 --apply.

`viewer`

Field	Default	Description
`port`	`4123`	Port used by `swarmvault graph serve`

`graph`

Optional graph-clustering tuning.

Field	Description
`communityResolution`	Optional Louvain resolution override for graph reports, viewer communities, and Obsidian community export output

`profile`

Optional deterministic vault-behavior presets and defaults layered alongside swarmvault.schema.md.

Field	Default	Description
`presets`	`[]`	Built-in preset list such as `reader`, `timeline`, `thesis`, or `diligence`
`dashboardPack`	`"default"`	Which dashboard emphasis to generate by default
`guidedSessionMode`	`"insights_only"`	Whether guided approval bundles target canonical pages or stay in `wiki/insights/`
`guidedIngestDefault`	`false`	Make `ingest`, `source add`, and `source reload` use guided mode by default
`deepLintDefault`	`false`	Make `swarmvault lint` include the advisory deep-lint pass by default
`dataviewBlocks`	`false`	Append Dataview blocks to dashboards and related artifacts

The personal-research preset enables both guidedIngestDefault and deepLintDefault, so source integration and linting start in their stronger modes until you override them with --no-guide or --no-deep.

`projects`

Optional project-aware source grouping and schema layering.

Field	Description
`projects.<id>.roots`	Workspace-relative directory prefixes used to assign sources to a project
`projects.<id>.schemaPath`	Optional project-specific schema appended after the root schema

`autoCommit`

Optional engine-level default for programmatic auto-commit helpers.

When true, integrations that call the engine auto-commit helper without forcing it can create a git commit for changed wiki/ and state/ content automatically.
The CLI-level --commit flags on ingest, compile, and query bypass this default and force the same git-aware behavior for that one command.
Outside a git worktree, auto-commit stays a no-op.

`agents`

Array of agent types to install rules for when an initialization command is run with --install-agent-rules. The default is empty, so init, quickstart, scan, and clone do not write project-local rule files unless you opt in. Supported values are "codex", "claude", "cursor", "goose", "pi", "gemini", "opencode", "aider", "copilot", "trae", "claw", and "droid".

`repoAnalysis`

Optional repo-wide source-class defaults and classification overrides for directory ingest plus repo watch.

Field	Description
`extractClasses`	Which source classes should be ingested by default during repo/directory ingest. Defaults to `["first_party"]`.
`classifyGlobs`	Optional extra glob patterns keyed by `first_party`, `third_party`, `resource`, or `generated`

`graphSinks`

Optional external graph sink configuration.

`graphSinks.neo4j`

Field	Description
`uri`	Neo4j Bolt or Aura URI
`username`	Neo4j username
`passwordEnv`	Environment variable containing the Neo4j password
`database`	Optional Neo4j database name. Defaults to `neo4j`
`vaultId`	Optional stable namespace for shared Neo4j databases
`includeClasses`	Which source classes to push by default. Defaults to `["first_party"]`
`batchSize`	Optional write batch size for `graph push neo4j`

`schedules`

Optional recurring jobs for compile, lint, query, and explore. See Schedules.

`orchestration`

Optional role mapping for research, audit, context, and safety across providers or external commands. See Orchestration.

`webSearch`

Optional web-search configuration used by swarmvault lint --deep --web.

It is separate from the normal LLM provider registry.

Field	Description
`providers`	Named web-search provider definitions
`tasks.deepLintProvider`	Which named provider to use for deep-lint evidence gathering

Schema File

swarmvault.schema.md is the vault-specific markdown instruction layer. It is how you teach one vault to behave differently from another without changing code or adding a custom parser.

Use it to define:

naming rules
concept and entity categories
relationship expectations
grounding and citation rules
exclusions

See Schema for examples and behavior details.

See Projects for project-aware schema layering, project_ids, and wiki/projects/ rollups.