Configuration

SwarmVault uses two root-level configuration surfaces:

  • swarmvault.config.json for runtime and provider configuration
  • swarmvault.schema.md for vault-specific compile and query guidance

Full Schema

{
  "workspace": {
    "rawDir": "raw",
    "wikiDir": "wiki",
    "stateDir": "state",
    "agentDir": "agent",
    "inboxDir": "inbox"
  },
  "providers": {
    "local": {
      "type": "heuristic",
      "model": "heuristic-v1",
      "capabilities": ["chat", "structured", "vision", "local"]
    }
  },
  "tasks": {
    "compileProvider": "local",
    "queryProvider": "local",
    "lintProvider": "local",
    "visionProvider": "local",
    "imageProvider": "local"
  },
  "profile": {
    "presets": [],
    "dashboardPack": "default",
    "guidedSessionMode": "insights_only",
    "guidedIngestDefault": false,
    "deepLintDefault": false,
    "dataviewBlocks": false
  },
  "viewer": {
    "port": 4123
  },
  "projects": {
    "engine": {
      "roots": ["packages/engine"],
      "schemaPath": "schemas/engine.schema.md"
    }
  },
  "repoAnalysis": {
    "extractClasses": ["first_party"],
    "classifyGlobs": {
      "third_party": ["third_party/**"],
      "resource": ["App.xcassets/**"],
      "generated": ["dist/**"]
    }
  },
  "graphSinks": {
    "neo4j": {
      "uri": "bolt://127.0.0.1:7687",
      "username": "neo4j",
      "passwordEnv": "NEO4J_PASSWORD",
      "database": "neo4j",
      "includeClasses": ["first_party"]
    }
  },
  "retrieval": {
    "backend": "sqlite",
    "shardSize": 25000,
    "hybrid": true,
    "rerank": false
  },
  "autoCommit": false,
  "agents": [],
  "schedules": {
    "nightly-compile": {
      "enabled": true,
      "when": { "cron": "0 3 * * *" },
      "task": { "type": "compile", "approve": true }
    }
  },
  "orchestration": {
    "maxParallelRoles": 2,
    "compilePostPass": false,
    "roles": {
      "research": {
        "executor": { "type": "provider", "provider": "local" }
      }
    }
  },
  "webSearch": {
    "providers": {
      "evidence": {
        "type": "http-json",
        "endpoint": "https://search.example/api/search",
        "method": "GET",
        "apiKeyEnv": "SEARCH_API_KEY",
        "apiKeyHeader": "Authorization",
        "apiKeyPrefix": "Bearer ",
        "queryParam": "q",
        "limitParam": "limit",
        "resultsPath": "results",
        "titleField": "title",
        "urlField": "url",
        "snippetField": "snippet"
      }
    },
    "tasks": {
      "deepLintProvider": "evidence"
    }
  }
}

embeddingProvider, audioProvider, and graph.communityResolution are intentionally omitted from the baseline local example above. The built-in heuristic provider is great for local/offline compile and query defaults, but it does not generate embeddings or audio transcripts. If you want semantic graph query without API keys, add an embedding-capable local backend such as Ollama and point tasks.embeddingProvider at that provider. If you want audio-file transcription, point tasks.audioProvider at a provider that exposes audio capability.

Sections

`workspace`

Controls the directory layout. All paths are relative to the workspace root.

FieldDefaultDescription
rawDir"raw"Root directory for canonical source and asset storage
wikiDir"wiki"Compiled markdown output
stateDir"state"Manifests, extracts, analyses, graph, retrieval, and jobs
agentDir"agent"Agent-specific files
inboxDir"inbox"Capture staging area for inbox import and watch mode

Set SWARMVAULT_OUT=<dir> when generated workspace artifacts should be isolated from the source tree. Config and schema files stay in the project root, while relative rawDir, wikiDir, stateDir, agentDir, and inboxDir values resolve under the output root. Absolute workspace paths remain absolute.

`providers`

Named provider definitions. SwarmVault supports built-in providers, named OpenAI-compatible presets such as OpenRouter and Groq, plus generic OpenAI-compatible and custom module adapters. See Provider Config.

`tasks`

Maps each engine task to a named provider from the providers object.

FieldDescription
compileProviderProvider used during compile
queryProviderProvider used for natural-language answers
lintProviderProvider used for lint and health checks
visionProviderProvider used for image-aware extraction
imageProviderOptional provider used for native image output generation
embeddingProviderOptional provider used for semantic graph query and embedding-backed similarity enrichment
audioProviderOptional provider used for audio-file transcription during ingest

`retrieval`

Optional local retrieval tuning layered on top of the compiled SQLite index.

FieldDefaultDescription
backend"sqlite"Stable local backend used for the retrieval index
shardSize25000Target page-row count per local shard; currently used for manifest planning
hybridtrueWhen an embedding-capable provider is configured, fuse semantic page hits into the same result set as full-text search
rerankfalseAsk the configured queryProvider to rerank merged search hits before answer generation

Hybrid search only kicks in when SwarmVault can resolve an embedding-capable provider, either through tasks.embeddingProvider or by falling back to a queryProvider that also supports embeddings, and the vault already has state/graph.json. Legacy search.hybrid and search.rerank keys migrate to retrieval.hybrid and retrieval.rerank with swarmvault migrate --target 3.0.0 --apply.

`viewer`

FieldDefaultDescription
port4123Port used by swarmvault graph serve

`graph`

Optional graph-clustering tuning.

FieldDescription
communityResolutionOptional Louvain resolution override for graph reports, viewer communities, and Obsidian community export output

`profile`

Optional deterministic vault-behavior presets and defaults layered alongside swarmvault.schema.md.

FieldDefaultDescription
presets[]Built-in preset list such as reader, timeline, thesis, or diligence
dashboardPack"default"Which dashboard emphasis to generate by default
guidedSessionMode"insights_only"Whether guided approval bundles target canonical pages or stay in wiki/insights/
guidedIngestDefaultfalseMake ingest, source add, and source reload use guided mode by default
deepLintDefaultfalseMake swarmvault lint include the advisory deep-lint pass by default
dataviewBlocksfalseAppend Dataview blocks to dashboards and related artifacts

The personal-research preset enables both guidedIngestDefault and deepLintDefault, so source integration and linting start in their stronger modes until you override them with --no-guide or --no-deep.

`projects`

Optional project-aware source grouping and schema layering.

FieldDescription
projects.<id>.rootsWorkspace-relative directory prefixes used to assign sources to a project
projects.<id>.schemaPathOptional project-specific schema appended after the root schema

`autoCommit`

Optional engine-level default for programmatic auto-commit helpers.

  • When true, integrations that call the engine auto-commit helper without forcing it can create a git commit for changed wiki/ and state/ content automatically.
  • The CLI-level --commit flags on ingest, compile, and query bypass this default and force the same git-aware behavior for that one command.
  • Outside a git worktree, auto-commit stays a no-op.

`agents`

Array of agent types to install rules for when an initialization command is run with --install-agent-rules. The default is empty, so init, quickstart, scan, and clone do not write project-local rule files unless you opt in. Supported values are "codex", "claude", "cursor", "goose", "pi", "gemini", "opencode", "aider", "copilot", "trae", "claw", and "droid".

`repoAnalysis`

Optional repo-wide source-class defaults and classification overrides for directory ingest plus repo watch.

FieldDescription
extractClassesWhich source classes should be ingested by default during repo/directory ingest. Defaults to ["first_party"].
classifyGlobsOptional extra glob patterns keyed by first_party, third_party, resource, or generated

`graphSinks`

Optional external graph sink configuration.

`graphSinks.neo4j`

FieldDescription
uriNeo4j Bolt or Aura URI
usernameNeo4j username
passwordEnvEnvironment variable containing the Neo4j password
databaseOptional Neo4j database name. Defaults to neo4j
vaultIdOptional stable namespace for shared Neo4j databases
includeClassesWhich source classes to push by default. Defaults to ["first_party"]
batchSizeOptional write batch size for graph push neo4j

`schedules`

Optional recurring jobs for compile, lint, query, and explore. See Schedules.

`orchestration`

Optional role mapping for research, audit, context, and safety across providers or external commands. See Orchestration.

`webSearch`

Optional web-search configuration used by swarmvault lint --deep --web.

It is separate from the normal LLM provider registry.

FieldDescription
providersNamed web-search provider definitions
tasks.deepLintProviderWhich named provider to use for deep-lint evidence gathering

Schema File

swarmvault.schema.md is the vault-specific markdown instruction layer. It is how you teach one vault to behave differently from another without changing code or adding a custom parser.

Use it to define:

  • naming rules
  • concept and entity categories
  • relationship expectations
  • grounding and citation rules
  • exclusions

See Schema for examples and behavior details.

See Projects for project-aware schema layering, project_ids, and wiki/projects/ rollups.