Knowledge Graph

The knowledge graph (state/graph.json) is the structured representation of compiled vault knowledge. Compile also turns that graph into orientation pages such as wiki/graph/report.md, wiki/graph/share-card.md, wiki/graph/share-card.svg, wiki/graph/share-kit/, wiki/graph/index.md, and per-community summaries under wiki/graph/communities/.

Graph Structure

{
  "generatedAt": "2025-01-15T10:30:00Z",
  "nodes": [...],
  "edges": [...],
  "hyperedges": [...],
  "communities": [...],
  "sources": [...],
  "pages": [...]
}

Nodes

Each node represents a source, module, symbol, rationale, concept, entity, task, or decision:

  • typesource, module, symbol, rationale, concept, entity, memory_task, or decision
  • pageId — Linked wiki page when one exists
  • projectIds — Project scope for the node
  • language — Code language for module and symbol nodes
  • symbolKind — Function, class, interface, enum, and similar code symbol types
  • communityId — Derived graph cluster membership
  • degree / bridgeScore / isGodNode — Derived connectivity signals
  • freshnessfresh or stale
  • confidence — Extraction confidence score
  • sourceClassfirst_party, third_party, resource, or generated for repo-aware material
  • rationale nodes — Parser-backed comments/docstrings linked back to modules or symbols through rationale_for
  • task nodes — Durable task ledger entries linked to context packs, outputs, touched paths, decisions, and follow-ups

Edges

Edges connect nodes with relationship metadata:

  • relation — Claim, import/export, define, call, inheritance, or implementation relation
  • statusextracted, inferred, conflicted, or stale
  • evidenceClassextracted, inferred, or ambiguous
  • confidence — Numeric confidence score
  • provenance — Source ids backing the edge
  • similarityReasons — Present on semantically_similar_to edges to explain which shared features triggered the inferred link
  • similarityBasisfeature_overlap or embeddings, so reports and the viewer can distinguish deterministic overlap from embedding-backed similarity
  • task relationsuses_context, records_decision, touched, produced_output, and follows_up connect tasks to their evidence and handoff trail

Hyperedges

The graph also carries top-level hyperedges for multi-node group patterns that cannot be represented cleanly as one pairwise edge:

  • relation — currently participate_in, implement, or form
  • nodeIds — the member nodes in the pattern
  • why — deterministic explanation of why the pattern was created
  • sourcePageIds — canonical pages that ground the pattern

Hyperedges feed wiki/graph/report.md, wiki/graph/report.json, swarmvault graph explain, MCP get_hyperedges, and GraphML/Cypher exports.

Community ids are derived locally from graph structure through a Louvain clustering pass over non-source nodes, while disconnected nodes stay in singleton communities. MCP get_community resolves a community by id or label and returns its members, pages, and top evidence edges.

The graph report JSON also carries higher-level health signals such as community cohesion, isolated-node warnings, ambiguous-edge ratios, and suggested follow-up questions derived from weak or ambiguous graph regions.

Pages

The graph also carries a page registry:

  • pages[].path and pages[].title for preview/navigation
  • pages[].kind and pages[].status for filtering
  • pages[].projectIds for project-aware search
  • pages[].sourceClass for first-party vs third-party/resource/generated filtering
  • pages[].backlinks and pages[].relatedPageIds for local workspace navigation

Local Graph Navigation

SwarmVault also exposes deterministic local graph tools without calling a provider:

  • swarmvault graph query
  • swarmvault graph path
  • swarmvault graph explain
  • swarmvault graph god-nodes
  • swarmvault graph share
  • swarmvault graph blast
  • swarmvault graph status
  • swarmvault graph update

The same graph-native read surface is exposed over MCP through query_graph, graph_report, graph_stats, get_node, get_community, get_neighbors, get_hyperedges, shortest_path, god_nodes, and blast_radius.

When an embedding-capable provider is available, swarmvault graph query and the graph-report similarity pass also use cached embeddings from state/embeddings.json. tasks.embeddingProvider is the explicit way to choose that backend, but SwarmVault can also fall back to a queryProvider with embeddings support. Without that, the same surfaces still work with lexical graph matching only.

Provenance

Every edge in the graph traces back to a specific source and claim. This enables you to verify any piece of knowledge by following the provenance chain back to the raw input.