Input Types
SwarmVault can mix documents, research URLs, transcripts, media, data exports, browser clips, and code in one vault. Code analysis stays local and parser-backed; configured providers are used only for workflows that need them, such as image or audio extraction.
| Input | Extensions / Sources | Extraction |
|---|---|---|
.pdf | Local text extraction | |
| Word documents | .docx .docm .dotx .dotm | Local extraction with metadata |
| Rich Text | .rtf | Local RTF text extraction via parser-backed walk |
| OpenDocument | .odt .odp .ods | Local text, slide, or sheet extraction |
| EPUB books | .epub | Local chapter-split HTML-to-markdown extraction |
| Datasets | .csv .tsv | Local tabular summary with bounded preview |
| Spreadsheets | .xlsx .xlsm .xlsb .xls .xltx .xltm | Local workbook and sheet preview extraction |
| Slide decks | .pptx .pptm .potx .potm | Local slide and speaker-note extraction |
| Jupyter notebooks | .ipynb | Local cell and output extraction |
| BibTeX libraries | .bib | Parser-backed citation entry extraction |
| Org-mode | .org | AST-backed headline, list, and block extraction |
| AsciiDoc | .adoc .asciidoc | Asciidoctor-backed section and metadata extraction |
| Transcripts | .srt .vtt | Local timestamped transcript extraction |
| Chat exports | Slack export .zip, extracted Slack export directories | Local channel/day conversation extraction |
.eml .mbox | Local message extraction and mailbox expansion | |
| Calendar | .ics | Local VEVENT expansion |
| Audio | .mp3 .wav .m4a .aac .ogg .webm and other audio/* files | Provider-backed transcription via tasks.audioProvider when configured |
| Video | .mp4 .mov .m4v .mkv .avi and URL inputs with --video | ffmpeg or yt-dlp extracts audio, then tasks.audioProvider transcribes it |
| HTML | .html, URLs | Readability plus Turndown to markdown |
| YouTube URLs | youtube.com/watch, youtu.be, youtube.com/embed, youtube.com/shorts | Direct transcript capture with title and video metadata |
| Images | .png .jpg .jpeg .gif .webp .bmp .tif .tiff .svg .ico .heic .heif .avif .jxl | Vision provider when configured |
| Research | arXiv, DOI, articles, X/Twitter | Normalized markdown via swarmvault add |
| Text docs | .md .mdx .txt .rst .rest | Direct ingest with lightweight .rst heading normalization |
| Config / data | .json .jsonc .json5 .toml .yaml .yml .xml .ini .conf .cfg .properties .env | Structured preview with key/value schema hints |
| Developer manifests | package.json tsconfig.json Cargo.toml pyproject.toml go.mod go.sum Dockerfile Makefile LICENSE .gitignore .editorconfig .npmrc and similar | Content-sniffed text ingest |
| Code | JavaScript, TypeScript, Python, Go, Rust, Java, Kotlin, Scala, Dart, Lua, Zig, C#, C/C++, PHP, Ruby, PowerShell, Elixir, OCaml, Objective-C, ReScript, Solidity, Vue, Svelte, Julia, Verilog/SystemVerilog, R, CSS, HTML, SQL, and extensionless scripts with common shebangs | AST/parser-backed analysis and module resolution; SQL adds table/view graph edges; R emits an explicit parser diagnostic until a safe grammar exists |
| Browser clips | Inbox bundles | Asset-rewritten markdown via inbox import |
For the shortest setup flow, start with `swarmvault quickstart`. For recurring files, directories, public GitHub repos, and docs hubs, use `swarmvault source add`.