Input Types

SwarmVault can mix documents, research URLs, transcripts, media, data exports, browser clips, and code in one vault. Code analysis stays local and parser-backed; configured providers are used only for workflows that need them, such as image or audio extraction.

InputExtensions / SourcesExtraction
PDF.pdfLocal text extraction
Word documents.docx .docm .dotx .dotmLocal extraction with metadata
Rich Text.rtfLocal RTF text extraction via parser-backed walk
OpenDocument.odt .odp .odsLocal text, slide, or sheet extraction
EPUB books.epubLocal chapter-split HTML-to-markdown extraction
Datasets.csv .tsvLocal tabular summary with bounded preview
Spreadsheets.xlsx .xlsm .xlsb .xls .xltx .xltmLocal workbook and sheet preview extraction
Slide decks.pptx .pptm .potx .potmLocal slide and speaker-note extraction
Jupyter notebooks.ipynbLocal cell and output extraction
BibTeX libraries.bibParser-backed citation entry extraction
Org-mode.orgAST-backed headline, list, and block extraction
AsciiDoc.adoc .asciidocAsciidoctor-backed section and metadata extraction
Transcripts.srt .vttLocal timestamped transcript extraction
Chat exportsSlack export .zip, extracted Slack export directoriesLocal channel/day conversation extraction
Email.eml .mboxLocal message extraction and mailbox expansion
Calendar.icsLocal VEVENT expansion
Audio.mp3 .wav .m4a .aac .ogg .webm and other audio/* filesProvider-backed transcription via tasks.audioProvider when configured
Video.mp4 .mov .m4v .mkv .avi and URL inputs with --videoffmpeg or yt-dlp extracts audio, then tasks.audioProvider transcribes it
HTML.html, URLsReadability plus Turndown to markdown
YouTube URLsyoutube.com/watch, youtu.be, youtube.com/embed, youtube.com/shortsDirect transcript capture with title and video metadata
Images.png .jpg .jpeg .gif .webp .bmp .tif .tiff .svg .ico .heic .heif .avif .jxlVision provider when configured
ResearcharXiv, DOI, articles, X/TwitterNormalized markdown via swarmvault add
Text docs.md .mdx .txt .rst .restDirect ingest with lightweight .rst heading normalization
Config / data.json .jsonc .json5 .toml .yaml .yml .xml .ini .conf .cfg .properties .envStructured preview with key/value schema hints
Developer manifestspackage.json tsconfig.json Cargo.toml pyproject.toml go.mod go.sum Dockerfile Makefile LICENSE .gitignore .editorconfig .npmrc and similarContent-sniffed text ingest
CodeJavaScript, TypeScript, Python, Go, Rust, Java, Kotlin, Scala, Dart, Lua, Zig, C#, C/C++, PHP, Ruby, PowerShell, Elixir, OCaml, Objective-C, ReScript, Solidity, Vue, Svelte, Julia, Verilog/SystemVerilog, R, CSS, HTML, SQL, and extensionless scripts with common shebangsAST/parser-backed analysis and module resolution; SQL adds table/view graph edges; R emits an explicit parser diagnostic until a safe grammar exists
Browser clipsInbox bundlesAsset-rewritten markdown via inbox import

For the shortest setup flow, start with `swarmvault quickstart`. For recurring files, directories, public GitHub repos, and docs hubs, use `swarmvault source add`.